date:20210121

Re: [PATCH] RISC-V: Fix -march option parsing when `p` extension exists.

2021-01-21 Thread Kito Cheng

Hi Jim:

I think this patch is small enough to accept without FSF copyright
assignment, and he is also on the way of the process, what do you
think?

On Fri, Jan 22, 2021 at 2:44 PM Xing GUO  wrote:
>
> Hi Kito,
>
> I’ve sent my assignments and my school’s disclaimer to ass...@gnu.org 
> ([gnu.org #1673033] Xing GUO), but I haven’t got response so far. Not sure if 
> this patch can be accepted as  a small bugfix patch. If not, I’m happy to 
> wait until FSF prove it.
>
> Best Regards,
> Xing
>
> On Jan 22, 2021, at 2:26 PM, Kito Cheng  wrote:
>
> Hi Xing:
>
> Thanks for your patch, but I would like to know did you have the
> copyright assignment for FSF? or your employee/company has signed
> that?
>
> On Thu, Jan 21, 2021 at 8:48 PM Xing GUO via Gcc-patches
>  wrote:
>
>
> This patch fixes -march option parsing when `p` extension exists,
> e.g., -march=rv64imafdcp should produce
>
> .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_p"
>
> rather than
>
> .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c_p"
>
> ---
> gcc/ChangeLog:
>
>* common/config/riscv/riscv-common.c
> (riscv_subset_list::parsing_subset_version):
>Fix -march option parsing when `p` extension exists.
>
> gcc/testsuite/ChangeLog:
>
>* gcc.target/riscv/attribute-18.c: New test.
> --
>
> Cheers,
> Xing
>
>

Re: [PATCH] RISC-V: Fix -march option parsing when `p` extension exists.

2021-01-21 Thread Xing GUO via Gcc-patches

Hi Kito,

I’ve sent my assignments and my school’s disclaimer to ass...@gnu.org 
 ([gnu.org #1673033] Xing GUO), but I haven’t got 
response so far. Not sure if this patch can be accepted as  a small bugfix 
patch. If not, I’m happy to wait until FSF prove it.

Best Regards,
Xing

> On Jan 22, 2021, at 2:26 PM, Kito Cheng  wrote:
> 
> Hi Xing:
> 
> Thanks for your patch, but I would like to know did you have the
> copyright assignment for FSF? or your employee/company has signed
> that?
> 
> On Thu, Jan 21, 2021 at 8:48 PM Xing GUO via Gcc-patches
>  wrote:
>> 
>> This patch fixes -march option parsing when `p` extension exists,
>> e.g., -march=rv64imafdcp should produce
>> 
>> .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_p"
>> 
>> rather than
>> 
>> .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c_p"
>> 
>> ---
>> gcc/ChangeLog:
>> 
>>* common/config/riscv/riscv-common.c
>> (riscv_subset_list::parsing_subset_version):
>>Fix -march option parsing when `p` extension exists.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.target/riscv/attribute-18.c: New test.
>> --
>> 
>> Cheers,
>> Xing

Re: [PATCH] RISC-V: Fix -march option parsing when `p` extension exists.

2021-01-21 Thread Kito Cheng via Gcc-patches

Hi Xing:

Thanks for your patch, but I would like to know did you have the
copyright assignment for FSF? or your employee/company has signed
that?

On Thu, Jan 21, 2021 at 8:48 PM Xing GUO via Gcc-patches
 wrote:
>
> This patch fixes -march option parsing when `p` extension exists,
> e.g., -march=rv64imafdcp should produce
>
> .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_p"
>
> rather than
>
> .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c_p"
>
> ---
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.c
> (riscv_subset_list::parsing_subset_version):
> Fix -march option parsing when `p` extension exists.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/attribute-18.c: New test.
> --
>
> Cheers,
> Xing

Re: [PATCH] c++: private inheritance access diagnostics fix [PR17314]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/21/21 2:28 PM, Anthony Sharp wrote:

Hi Jason,

I've finally completed my copyright assignment form. I've attached it
to this email for reference.


You don't need write access to the main repository to use these commands
on your local copy.  One nice thing about git compared to svn is that
you don't need to touch the server for anything but push and pull.

Incidentally, how are you producing your patch?  Maybe try git
format-patch instead.


The method I am using at the moment is the one Ranjit Mathew talks
about here: http://rmathew.com/articles/gcj/crpatch.html. Actually,
having just re-read it, it says: 'NOTE: This is not the “proper” or
“official” way of creating and submitting patches - that process has
been explained in detail elsewhere. That process requires one to use
Subversion (SVN). The process described here is meant for “one-off
hackers” or people who cannot use SVN for some reason or the other.'
... oops!

It's my fault kind of - the official GCC webpage
(https://gcc.gnu.org/gitwrite.html) explaining how to do it is called
'Read-write Git access' so I assumed it was only relevant for people
who have access to the repo, but I see that is not the case.

I've tried the git way of doing it and I'm attaching a new patch file
that (hopefully) is better this time. Basically what I did was what
you suggested:

git pull
contrib/gcc-git-customization.sh
(make changes)
git add *
git gcc-commit-mklog
git gcc-commit-mklog --amend
git format-patch -1 master

I also re-built the source just to make sure I hadn't messed anything
up. I re-ran the C++ regression tests using make check-c and make
check-c++. Whilst I did not do a before/after comparison of the
results, I checked the FAILs in gcc.sum and g++.sum and they all
looked like they had nothing to do with my code. All the code is the
same as before, so I'm thinking it should be fine (I just wanted to be
safe). Also checked against check_GNU_style.sh.

Assuming that's all fine, as for the code itself, there might well be
some tweaks that could make it better, and so if that is the case then
please let me know.


The code looks good, I just have some minor tweaks.  Thanks!


+++ b/gcc/cp/semantics.c

...

+extern access_kind access_in_type (tree type, tree decl);

...

+static tree
+get_parent_with_private_access (tree decl, tree binfo)


Instead of making access_in_type non-static, let's defiine 
get_parent_with_private_access in search.c and declare it in cp-tree.h 
(with the declarations of nearby search.c functions).



+  /* If we have not already figured out why DECL is innaccessible...  */

...

+  /* Couldn't figure out why DECL is innaccesible, so just say it's
+  innaccessible.  */


Only one 'n' in inaccessible.

There are various minor formatting issues:

(https://www.gnu.org/prep/standards/standards.html#Formatting)


+  /* Couldn't figure out why DECL is innaccesible, so just say it's
+  innaccessible.  */


Subsequent lines of a comment should be indented to line up with the 
first line.  This applies to all your multi-line comments.



-{
-  if (issue_error)
-   error ("%q#D is private within this context", diag_decl);
-  inform (DECL_SOURCE_LOCATION (diag_decl),
- "declared private here");
-}
+  {
+if (issue_error)
+  error ("%q#D is private within this context", diag_decl);
+inform (DECL_SOURCE_LOCATION (diag_location), "declared private here");
+  }


Don't change the indentation of these blocks; in the GNU coding style 
the { } are indented two spaces from the if.



+   tree parent_binfo = get_parent_with_private_access (decl,
+ basetype_path);

...

+   complain_about_access (decl, diag_decl, diag_location, true,
+ parent_access);


The new line of arguments should be indented to line up with the first one.

Jason

Re: [PATCH] avoid -Wnonnull for COND_EXPR in static_cast (PR 98646)

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/20/21 7:30 PM, Martin Sebor wrote:

Similar to pr96003, bug 98646 reports a spurious instance of -Wnonnull
calling a member function on the result of static_cast.  The difference
here is that the cast argument is a function call, and, besides casting
down an inheritance hierarchy, the cast also adds the const qualifier.
GCC sets the NO_WARNING bit for the COND_EXPR it emits for the cast
in these cases to avoid issuing spurious -Wnonnull warnings but then
doesn't preserve the bit in a call to cp_fold_convert() on it, leading
to the false positive.

The attached patch arranges for cp_fold_convert() to propagate the bit
to the result, preventing the front end -Wnonnull handler from issuing
the warning.

In addition, it also improves the wording of the message (suggested by
users in the past).



+  if (nowarn)
+TREE_NO_WARNING (conv) = nowarn;


Let's only do this if the TREE_CODE of conv is the same as expr.  OK 
with that change.


Jason

Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-01-21 Thread David Malcolm via Gcc-patches

On Thu, 2021-01-21 at 20:09 +0100, Jan Hubicka wrote:
> > On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > > >  wrote:
> > > > > gimple.h has this comment for gimple's uid field:
> > > > > 
> > > > >   /* UID of this statement.  This is used by passes that want
> > > > > to
> > > > >  assign IDs to statements.  It must be assigned and used
> > > > > by
> > > > > each
> > > > >  pass.  By default it should be assumed to contain
> > > > > garbage.  */
> > > > >   unsigned uid;
> > > > > 
> > > > > and gimple_set_uid has:
> > > > > 
> > > > >Please note that this UID property is supposed to be
> > > > > undefined
> > > > > at
> > > > >pass boundaries.  This means that a given pass should not
> > > > > assume it
> > > > >contains any useful value when the pass starts and thus
> > > > > can
> > > > > set it
> > > > >to any value it sees fit.
> > > > > 
> > > > > which suggests that any pass can use the uid field as an
> > > > > arbitrary
> > > > > scratch space.
> > > > > 
> > > > > PR analyzer/98599 reports a case where this error occurs in
> > > > > LTO
> > > > > mode:
> > > > >   fatal error: Cgraph edge statement index out of range
> > > > > on certain inputs with -fanalyzer.
> > > > > 
> > > > > The error occurs in the LTRANS phase after -fanalyzer runs in
> > > > > the
> > > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > > stmts.
> > > > > 
> > > > > The error occurs when LTRANS is streaming callgraph edges
> > > > > back
> > > > > in.
> > > > > If I'm reading things correctly, the LTO format uses stmt
> > > > > uids to
> > > > > associate call stmts with callgraph edges between WPA and
> > > > > LTRANS.
> > > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > > gimple_uid, and input_edge reads it back in.
> > > > > 
> > > > > Hence IPA passes that touch the uids in WPA need to restore
> > > > > them,
> > > > > or the stream-in at LTRANS will fail.
> > > > > 
> > > > > Is it intended that the LTO machinery relies on the value of
> > > > > the
> > > > > uid
> > > > > field being preserved during WPA (or, at least, needs to be
> > > > > saved
> > > > > and
> > > > > restored by passes that touch it)?
> > > > 
> > > > I belive this is solely at the cgraph stream out to stream in
> > > > boundary but
> > > > this may be a blurred area since while we materialize the whole
> > > > cgraph
> > > > at once the function bodies are streamed in on demand.
> > > > 
> > > > Honza can probably clarify things.
> > > 
> > > Well, the uids are used to associate cgraph edges with
> > > statements.  At
> > > WPA stage you do not have function bodies and thus uids serves
> > > role
> > > of
> > > pointers to the statement.  If you load the body in (via
> > > get_body)
> > > the
> > > uids are replaced by pointers and when you stream out uids are
> > > recomputed again.
> > > 
> > > When do you touch the uids? At WPA time or from small IPA pass in
> > > ltrans?
> > 
> > The analyzer is here in passes.def:
> >   INSERT_PASSES_AFTER (all_regular_ipa_passes)
> >   NEXT_PASS (pass_analyzer);
> > 
> > and so in LTO runs as the first regular IPA pass at WPA time,
> > when do_whole_program_analysis calls:
> >   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> > 
> > FWIW I hope to eventually have a way to summarize function bodies
> > in
> > the analyzer, but I don't yet, so I'm currently brute forcing
> > things by
> > loading all function bodies at the start of the analyzer (when
> > -fanalyzer is enabled).
> > 
> > I wonder if that's messing things up somehow?
> 
> Actually I think it should work.  If you do get_body or
> get_untransformed_body (that will be equal at that time) you ought to
> get ids in symtab datastructure rewritten to pointers and at stream
> out
> time we should assign new ids...
> > Does the stream-out from WPA make any assumptions about the stmt
> > uids?
> > For example, 
> >   #define STMT_UID_NOT_IN_RANGE(uid) \
> > (gimple_stmt_max_uid (fn) < uid || uid == 0)
> > seems to assume that the UIDs are per-function ranges from
> >   [0-gimple_stmt_max_uid (fn)]
> > which isn't the case for the uids set by the analyzer.  Maybe
> > that's
> > the issue here?
> > 
> > Sorry for not being more familiar with the IPA/LTO code
> 
> There is lto_prepare_function_for_streaming which assigns uids to be
> incremental.   So I guess problem is that it is not called at WPA
> time
> if function is in memory (since at moment we do not really modify
> bodies
> at WPA time, however we do stream them in sometimes to icf compare
> them
> or to update profile).
> 
> So i guess fix would be to arrange lto_prepare_function_for_streaming
> to
> be called on all functions with body defined before WPA stream-out?

Thanks.

I think my earlier analysis was wrong.

With the caveat that I'm not as familiar with the IPA code as other
parts of the compiler, what I think

Re: [PATCH] c++: Suppress this injection for static member functions [PR97399]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/21/21 11:22 AM, Patrick Palka wrote:

Here at parse time finish_qualified_id_expr adds an implicit 'this->' to
the expression tmp::integral (because it's type-dependent, and also
current_class_ptr is set) within the trailing return type, and later
during substitution we can't resolve the 'this' since
tsubst_function_type does inject_this_parm only for non-static member
functions which tmp::func is not.

It seems the root of the problem is that we set current_class_ptr when
parsing the signature of a static member function.  Since explicit uses
of 'this' are already not allowed in this context, we probably shouldn't
be doing inject_this_parm either.


Hmm, 'this' is defined in a static member function, it's just ill-formed 
to use it:


7.5.2/2: "... [this] shall not appear within the declaration of a static 
member function (although its type and value category are defined within 
a static member function as they are within a non-static member 
function). [Note: This is because declaration matching does not occur 
until the complete declarator is known. — end note]"


Perhaps maybe_dummy_object needs to be smarter about recognizing static 
context.  Or perhaps there's no way to tell the difference between the 
specified behavior above and the behavior with your patch, but:


The note suggests that we need to test the case of an out-of-class 
definition of a static member function (template); we must 
inject_this_parm when parsing such a declaration, since we don't know 
it's static until we match it to the declaration in the class.  I'm 
guessing that this would lead to the same problem.



Bootstrapped and regtested on x64_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

PR c++/97399
* parser.c (cp_parser_init_declarator): If the storage class
specifier is sc_static, pass true for static_p to
cp_parser_declarator.
(cp_parser_direct_declarator): Don't do inject_this_parm when
the member function is static.

gcc/testsuite/ChangeLog:

PR c++/88548
PR c++/97399
* g++.dg/cpp0x/this2.C: New test.
* g++.dg/template/pr97399a.C: New test.
* g++.dg/template/pr97399b.C: New test.
---
  gcc/cp/parser.c  |  5 +++--
  gcc/testsuite/g++.dg/cpp0x/this2.C   |  8 
  gcc/testsuite/g++.dg/template/pr97399a.C | 11 +++
  gcc/testsuite/g++.dg/template/pr97399b.C | 11 +++
  4 files changed, 33 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/this2.C
  create mode 100644 gcc/testsuite/g++.dg/template/pr97399a.C
  create mode 100644 gcc/testsuite/g++.dg/template/pr97399b.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 48437f23175..18cf9888632 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -21413,6 +21413,7 @@ cp_parser_init_declarator (cp_parser* parser,
bool is_non_constant_init;
int ctor_dtor_or_conv_p;
bool friend_p = cp_parser_friend_p (decl_specifiers);
+  bool static_p = decl_specifiers->storage_class == sc_static;
tree pushed_scope = NULL_TREE;
bool range_for_decl_p = false;
bool saved_default_arg_ok_p = parser->default_arg_ok_p;
@@ -21446,7 +21447,7 @@ cp_parser_init_declarator (cp_parser* parser,
  = cp_parser_declarator (parser, CP_PARSER_DECLARATOR_NAMED,
flags, &ctor_dtor_or_conv_p,
/*parenthesized_p=*/NULL,
-   member_p, friend_p, /*static_p=*/false);
+   member_p, friend_p, static_p);
/* Gather up the deferred checks.  */
stop_deferring_access_checks ();
  
@@ -22122,7 +22123,7 @@ cp_parser_direct_declarator (cp_parser* parser,
  
  		  tree save_ccp = current_class_ptr;

  tree save_ccr = current_class_ref;
- if (memfn)
+ if (memfn && !static_p)
/* DR 1207: 'this' is in scope after the cv-quals.  */
inject_this_parameter (current_class_type, cv_quals);
  
diff --git a/gcc/testsuite/g++.dg/cpp0x/this2.C b/gcc/testsuite/g++.dg/cpp0x/this2.C

new file mode 100644
index 000..3781bc5efec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/this2.C
@@ -0,0 +1,8 @@
+// PR c++/88548
+// { dg-do compile { target c++11 } }
+
+struct S {
+  int a;
+  template  static auto m1 ()
+-> decltype(this->a) { return 0; }; // { dg-error "'this'" }
+};
diff --git a/gcc/testsuite/g++.dg/template/pr97399a.C 
b/gcc/testsuite/g++.dg/template/pr97399a.C
new file mode 100644
index 000..3713dbde6e0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/pr97399a.C
@@ -0,0 +1,11 @@
+// PR c++/97399
+// { dg-do compile { target c++11 } }
+
+template  struct enable_if_t {};
+struct tmp {
+  template  static constexpr bool is_integral();
+  template  static auto func()
+-> enable_if_t()>;
+};
+template  constexpr bool tmp::is_integral() { return true; }
+int main() { tmp::func(); }
diff --git a/gcc/testsuite/

Re: [PATCH] c++: ICE with noexcept in class in member function [PR96623]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/21/21 5:45 PM, Marek Polacek wrote:

I discovered very strange code in inject_parm_decls:

if (args && is_this_parameter (args))
  {
gcc_checking_assert (current_class_ptr == NULL_TREE);
current_class_ptr = NULL_TREE;

We are tripping up on the assert because when we call inject_parm_decls,
current_class_ptr is set to 'A'.  It was set by inject_this_parameter
after we've parsed the parameter-declaration-clause of the member
function foo.


But then it should be restored (to null) by the ccp = save_ccp a few 
lines later.



It seems correct to set ccp/ccr to A::B when we're
late parsing the noexcept-specifiers of bar* functions in B, so that
this-> does the right thing.


Agreed.


Since inject_parm_decls can mess with
ccp/ccr, I think best if we properly restore it after the late parsing
of noexcept-specifiers.


pop_injected_parms clears them, which is restoring them if we keep the 
assert.



It should also work to clear ccp before calling inject_parm_decls, and
removing the assignment following the assert, should the assert stay.


But why is it non-null before parsing the unparsed_noexcepts?

Jason

Re: [PATCH/RFC] combine: Tweak the condition of last_set invalidation

2021-01-21 Thread Kewen.Lin via Gcc-patches

Hi Segher,

on 2021/1/22 上午8:30, Segher Boessenkool wrote:
> Hi Ke Wen,
> 
> On Fri, Jan 15, 2021 at 04:06:17PM +0800, Kewen.Lin wrote:
>> on 2021/1/15 上午8:22, Segher Boessenkool wrote:
>>> On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote:
... op regX  // this regX could find wrong last_set below
regX = ...   // if we think this set is valid
... op regX

 But because of retry's existence, the last_set_table_tick could
>>>
>>> It is not just because of retry: combine can change other insns than
>>> just i2 and i3, too.  And even changing i2 requires this!
>>
>> Ah, thanks for the information!  Here retry is one example for that
>> we can revisit one instruction but meanwhile the stored information
>> for reg reference can be from that instruction after the current
>> one but visited before.
> 
> Yes; and we do not usually revisit just one insn, but everything after
> it as well.  We only need to revisit thos insns that are fed by what
> has changed, but this is a good enough approximation (we never revisit
> very far back).
> 
>>> The whole reg_stat stuff is an ugly hack that does not work well.  For
>>> example, as in your example, some "known" value can be invalidated
>>> before the combination that wants to know that value is tried.
>>>
>>> We need to have this outside of combine, in a dataflow(-like) thing
>>> for example.  This could take the place of REG_EQ* as well probably
>>> (which is good, there are various problems with that as well).
>>
>> Good point, but IIUC we still need to keep updating(tracking)
>> information like what we put into reg_stat stuff, it's not static
>> since as you pointed out above, combine can change i2/i3 etc,
>> we need to update the information for the changes.
> 
> Yes, we should keep it correct all the time, and for any point in the
> code.  It also can be used by other passes, e.g. it can replace all
> REG_EQ* notes, all nonzero_bits and num_sign_bit_copies, and no doubt
> even more things.
> 
>> Anyway, it's not what this patch tries to solve.  :-P
> 
> :-)
> 
 This proposal is to check whether the last_set_table safely happens
 after the current set, make the set still valid if so.
>>>
>>> I don't think this is safe to do like this, unfortunately.  There are
>>> more places that set last_set_invalid (well, one more), so at the very
>>> minimum this needs a lot more justification.
>>
>> Let me try to explain it more.
>> * Background *
>>
>> There are two places which set last_set_invalid to 1. 
>>
>> CASE 1:
> 
> 
> 
> Thanks for the in-depth explanation!
> 
> I think this should be postponed to stage 1 though?  Or is there
> anything very urgent in it?
> 

Yeah, I agree that this belongs to stage1, and there isn't anything
urgent about it.  Thanks for all further comments above!


BR,
Kewen

Re: skip asan-poisoning of discarded vars

2021-01-21 Thread Alexandre Oliva

On Jan 21, 2021, Alexandre Oliva  wrote:

> On Jan 21, 2021, Alexandre Oliva  wrote:
>> But I was wrong.  The bootstrap with the added assert has just failed,
>> as early as stage2 libiberty.  Looking into it...

> Uhh, I take that back.  I just goofed in the assert, inverting the
> condition.  Long day...

> With the correct condition, it's got past the stage2 compilation of all
> of the gcc deps and Ada sources, and then some.  Maybe my reasoning
> wasn't wrong, after all ;-)

Yeah, confirmed, bootstrap-asan (and -ubsan) completed on
x86_64-linux-gnu, all languages, with the following patchlet
(cut&pasted, then retabified manually, so it may not apply
mechanically):

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d2ac5f9..c0dcb39 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1797,6 +1797,9 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
  && dbg_cnt (asan_use_after_scope)
  && !gimplify_omp_ctxp)
{
+ gcc_assert (DECL_SEEN_IN_BIND_EXPR_P (decl)
+ || (DECL_ARTIFICIAL (decl) 
+ && DECL_NAME (decl) == NULL_TREE));
  asan_poisoned_variables->add (decl);
  asan_poison_variable (decl, false, seq_p);
  if (!DECL_ARTIFICIAL (decl) && gimplify_ctxp->live_switch_vars)


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [PATCH/RFC] combine: Tweak the condition of last_set invalidation

2021-01-21 Thread Segher Boessenkool

Hi Ke Wen,

On Fri, Jan 15, 2021 at 04:06:17PM +0800, Kewen.Lin wrote:
> on 2021/1/15 上午8:22, Segher Boessenkool wrote:
> > On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote:
> >>... op regX  // this regX could find wrong last_set below
> >>regX = ...   // if we think this set is valid
> >>... op regX
> >>
> >> But because of retry's existence, the last_set_table_tick could
> > 
> > It is not just because of retry: combine can change other insns than
> > just i2 and i3, too.  And even changing i2 requires this!
> 
> Ah, thanks for the information!  Here retry is one example for that
> we can revisit one instruction but meanwhile the stored information
> for reg reference can be from that instruction after the current
> one but visited before.

Yes; and we do not usually revisit just one insn, but everything after
it as well.  We only need to revisit thos insns that are fed by what
has changed, but this is a good enough approximation (we never revisit
very far back).

> > The whole reg_stat stuff is an ugly hack that does not work well.  For
> > example, as in your example, some "known" value can be invalidated
> > before the combination that wants to know that value is tried.
> > 
> > We need to have this outside of combine, in a dataflow(-like) thing
> > for example.  This could take the place of REG_EQ* as well probably
> > (which is good, there are various problems with that as well).
> 
> Good point, but IIUC we still need to keep updating(tracking)
> information like what we put into reg_stat stuff, it's not static
> since as you pointed out above, combine can change i2/i3 etc,
> we need to update the information for the changes.

Yes, we should keep it correct all the time, and for any point in the
code.  It also can be used by other passes, e.g. it can replace all
REG_EQ* notes, all nonzero_bits and num_sign_bit_copies, and no doubt
even more things.

> Anyway, it's not what this patch tries to solve.  :-P

:-)

> >> This proposal is to check whether the last_set_table safely happens
> >> after the current set, make the set still valid if so.
> > 
> > I don't think this is safe to do like this, unfortunately.  There are
> > more places that set last_set_invalid (well, one more), so at the very
> > minimum this needs a lot more justification.
> 
> Let me try to explain it more.
> * Background *
> 
> There are two places which set last_set_invalid to 1. 
> 
> CASE 1:



Thanks for the in-depth explanation!

I think this should be postponed to stage 1 though?  Or is there
anything very urgent in it?


Segher

[committed] MAINTAINERS: Update my e-mail address

2021-01-21 Thread Maciej W. Rozycki

* MAINTAINERS (Write After Approval): Update my e-mail address.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c0aa23df57e..b68fe8ed431 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -580,7 +580,7 @@ Craig Rodrigues 

 Erven Rohou
 Ira Rosen  
 Yvan Roux  
-Maciej W. Rozycki  
+Maciej W. Rozycki  
 Silvius Rus
 Matthew Sachs  
 Hariharan Sandanagobalane  
-- 
2.11.0

Re: [PATCH 4/4] rs6000: Update testcases' instruction count

2021-01-21 Thread Segher Boessenkool

Hi!

On Sat, Oct 10, 2020 at 03:08:25AM -0500, Xionghu Luo wrote:
> 2020-10-10  Xionghu Luo  
> 
>   * gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust
>   instruction counts.
>   * gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
>   * gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
>   * gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
>   * gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
>   * gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
>   * gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
>   * gcc.target/powerpc/vsx-builtin-7.c: Likewise.

Looks good.  I assume you tested all those changed counts are actual
wanted code?  Okay for trunk if so.  Thanks!


Segher

Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8

2021-01-21 Thread Segher Boessenkool

Hi!

You never committed 2/4?  That makes it harder to review this one :-)

On Sat, Oct 10, 2020 at 03:08:24AM -0500, Xionghu Luo wrote:
> gcc/ChangeLog:
> 
> 2020-10-10  Xionghu Luo  
> 
>   * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>   Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
>   platforms.
>   * config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
>   to call different path for P8 and P9.
>   (rs6000_expand_vector_set_var_p9): New function.
>   (rs6000_expand_vector_set_var_p8): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-10-10  Xionghu Luo  
> 
>   * gcc.target/powerpc/pr79251.p8.c: New test.

If testing on P9 LE and P7 BE (32-bit and 64-bit) worked, this is okay
for trunk.  Thanks!

(Let me know if you need help testing.)


Segher

[PATCH, rs6000] Deprecate unnecessary __builtin_dfp_dtstsfi_*_dd and td overloads

2021-01-21 Thread will schmidt via Gcc-patches

[PATCH, rs6000] Deprecate unnecessary __builtin_dfp_dtstsfi_*_dd and td 
overloads

Hi,
  Noted as part of the work-in-progress builtins rewrite, the
__builtin_dfp_dtstsfi_*_{dd,td} builtins are redundant, and are thusly
being marked as deprecated.  They will be removed as part of the builtins
rewrite sometime in the future.
This includes the builtins __builtin_dfp_dtstsfi_eq_dd,
__builtin_dfp_dtstsfi_gt_dd, __builtin_dfp_dtstsfi_lt_dd,
__builtin_dfp_dtstsfi_ov_dd, __builtin_dfp_dtstsfi_eq_td,
__builtin_dfp_dtstsfi_gt_td, __builtin_dfp_dtstsfi_lt_td,
and __builtin_dfp_dtstsfi_ov_td.

Regtests underway.

OK for trunk?

Thanks
-Will

--

gcc/ChangeLog:
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Mark builtins P9_BUILTIN_DFP_TSTSFI_LT_DD, P9_BUILTIN_DFP_TSTSFI_EQ_DD
P9_BUILTIN_DFP_TSTSFI_GT_DD, P9_BUILTIN_DFP_TSTSFI_OV_DD,
P9_BUILTIN_DFP_TSTSFI_LT_TD, P9_BUILTIN_DFP_TSTSFI_EQ_TD,
P9_BUILTIN_DFP_TSTSFI_GT_TD, P9_BUILTIN_DFP_TSTSFI_OV_TD as deprecated.
* doc/extend.texi: Update examples to indicate deprecated functions.

testsuite/ChangeLog:
* gcc.target/powerpc/dfp/dtstsfi-10.c: Mark 
__builtin_dfp_dtstsfi_*_{dd,td}
calls as deprecated.
* gcc.target/powerpc/dfp/dtstsfi-11.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-12.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-13.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-14.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-15.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-16.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-17.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-18.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-19.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-30.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-31.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-32.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-33.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-34.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-35.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-36.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-37.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-38.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-39.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-50.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-51.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-52.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-53.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-54.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-55.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-56.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-57.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-58.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-59.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-70.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-71.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-72.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-73.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-74.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-75.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-76.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-77.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-78.c: Same.
* gcc.target/powerpc/dfp/dtstsfi-79.c: Same.
* gcc.target/powerpc/pr92661.c: Same.

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index cdc64bd63c66..9a79e5684f20 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -946,10 +946,21 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
   else if (fcode == ALTIVEC_BUILTIN_VEC_LVSR && !BYTES_BIG_ENDIAN)
 warning (OPT_Wdeprecated,
 "% is deprecated for little endian; use "
 "assignment for unaligned loads and stores");
 
+  if (fcode == P9_BUILTIN_DFP_TSTSFI_LT_DD
+   || fcode == P9_BUILTIN_DFP_TSTSFI_EQ_DD
+   || fcode == P9_BUILTIN_DFP_TSTSFI_GT_DD
+   || fcode == P9_BUILTIN_DFP_TSTSFI_OV_DD
+   || fcode == P9_BUILTIN_DFP_TSTSFI_LT_TD
+   || fcode == P9_BUILTIN_DFP_TSTSFI_EQ_TD
+   || fcode == P9_BUILTIN_DFP_TSTSFI_GT_TD
+   || fcode == P9_BUILTIN_DFP_TSTSFI_OV_TD)
+  warning (OPT_Wdeprecated, "builtin '%s' is deprecated",
+  IDENTIFIER_POINTER (DECL_NAME (fndecl)));
+
   if (fcode == ALTIVEC_BUILTIN_VEC_MUL)
 {
   /* vec_mul needs to be special cased because there are no instructions
 for it for the {un}signed char, {un}signed short, and {un}signed int
 types.  */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8daa1c679748..90db01daeac6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -17859,31 +17859,33 @@ int __builtin_byte_in_set (unsigned char u, unsigned 
long long set);
 int __builtin_byte_in_range (unsigned char u, unsigned int range);
 int __builtin_byte_in_either_range (unsigned char u, unsigned int ranges);
 
 int __builtin_dfp_dtstsfi_lt (unsigned int com

Re: [PATCH] improve warning suppression for inlined functions (PR 98465, 98512)

2021-01-21 Thread Martin Sebor via Gcc-patches


The initial patch I posted is missing initialization for a couple
of locals.  I'd noticed it in testing but forgot to add the fix to
the patch before posting it.  I have corrected that in the updated
revision and also added the test case from pr98512, and retested
the whole thing on x86_64-linux.

On 1/19/21 11:58 AM, Martin Sebor wrote:

std::string tends to trigger a class of false positive out of bounds
access warnings for code GCC cannot prove is unreachable because of
missing aliasing constrains, and that ends up expanded inline into
user code.  Simply inserting the contents of a constant char array
does that.  In GCC 10 these false positives are suppressed due to
-Wno-system-headers, but in GCC 11, to help detect calls rendered
invalid by user code passing in either incorrect or insufficiently
constrained arguments, -Wno-system-header no longer has this effect
on invalid access warnings.

To solve the problem without at least partially reverting the change
and going back to the GCC 10 way of things for the affected subset
of calls (just memcpy and memmove), the attached patch enhances
the #pragma GCC diagnostic machinery to consider not just a single
location for inlined code but all locations at which an expression
and its callers are inlined all the way up the stack.  This gives
each author of a function involved in inlining the ability to
control a warning issued for the code, not just the user into whose
code all the calls end up inlined.  To resolve PR 98465, it lets us
suppress the false positives selectively in std::string rather
than across the board in GCC.

The solution is to provide a new pair of overloads for warning
functions that, instead of taking a single location argument, take
a tree node from which the location(s) are determined.  The tree
argument is indirect because the diagnostic machinery doesn't (and
cannot without more intrusive changes) at the moment depend on
the various tree definitions.  A nice feature of these overloads
is that they do away with the need for the %K directive (and in
the future also %G, with another enhancement to accept a gimple*
argument).

This patch depends on the fix for PR 98664 (already approved but
not yet checked in).  I've tested it on x86_64-linux.

To avoid fallout I tried to keep the changes to a minimum, and
so the design isn't as robust as I'd like it ultimately to be.
I plan to enhance it in stage 1.

Martin


PR middle-end/98465 - Bogus -Wstringop-overread in std::string
PR middle-end/98512 - “#pragma GCC diagnostic ignored” ineffective in conjunction with alias attribute

gcc/ChangeLog:

	PR middle-end/98465
	PR middle-end/98512
	* builtins.c (class diag_inlining_context): New class.
	(maybe_warn_for_bound): Adjust signature.  Use diag_inlining_context.
	(warn_for_access): Same.
	(check_access): Remove calls to tree_inlined_location.
	(expand_builtin_strncmp): Remove argument from calls to
	maybe_warn_for_bound.
	(warn_dealloc_offset): Adjust signature.  Use diag_inlining_context.
	(maybe_emit_free_warning): Remove calls to tree_inlined_location.
	* diagnostic-core.h (warning, warning_n): New overloads.
	* diagnostic-metadata.h (class diagnostic_metadata::location_context):
	New.
	(struct diagnostic_info): Declare.
	* diagnostic.c (location_context::locations): Define.
	(update_effective_level_from_pragmas): Use location_context to test
	inlinined locations.
	(diagnostic_report_diagnostic): Set location context.
	(warning, warning_n): Define new overloads.
	* diagnostic.h (diagnostic_inhibit_notes):

gcc/cp/ChangeLog:

	* mapper-client.cc: Include headers needed by others.

libstdc++-v3/ChangeLog:

	PR middle-end/98465
	* include/bits/basic_string.tcc (_M_replace): Suppress false positive
	warnings.
	* testsuite/18_support/new_delete_placement.cc: Suppress valid warnings.
	* testsuite/20_util/monotonic_buffer_resource/allocate.cc: Same.
	* testsuite/20_util/unsynchronized_pool_resource/allocate.cc: Same.

gcc/testsuite/ChangeLog:

	PR middle-end/98512	
	* gcc.dg/pragma-diag-9.c: New test.
	* gcc.dg/pragma-diag-10.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 0aed008687c..39fe1d0a6e0 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -39,7 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "optabs.h"
 #include "emit-rtl.h"
 #include "recog.h"
-#include "diagnostic-core.h"
+#include "diagnostic.h"
 #include "alias.h"
 #include "fold-const.h"
 #include "fold-const-call.h"
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-outof-ssa.h"
 #include "attr-fnspec.h"
 #include "demangle.h"
+#include "tree-pretty-print.h"
 
 struct target_builtins default_target_builtins;
 #if SWITCHABLE_TARGET
@@ -749,6 +750,93 @@ is_builtin_name (const char *name)
   return false;
 }
 
+/* Class to override the base location context for an expression EXPR.  */
+
+class diag_inlining_context: public diagnostic_metadata::location_context
+{
+ public:
+  diag_inlining_context (tree expr): m_ex

[PATCH] correct fix to avoid false positives for vectorized stores (PR 96963, 94655)

2021-01-21 Thread Martin Sebor via Gcc-patches


The hack I put in compute_objsize() last January for pr93200 isn't
quite correct.  It happened to suppress the false positive there
but, due to what looks like a thinko on my part, not in some other
test cases involving vectorized stores.

The attached change adjusts the hack to have compute_objsize() give
up on all MEM_REFs with a vector type.  This effectively disables
-Wstringop-{overflow,overread} for vector accesses (either written
by the user or synthesized by GCC from ordinary accesses).  It
doesn't affect -Warray-bounds because this warning doesn't use
compute_objsize() yet.  When it does (it should considerably
simplify the code) some additional changes will be needed to
preserve -Warray-bounds for out of bounds vector accesses.
The test this patch adds should serve as a reminder to make
it if we forget.

Tested on x86_64-linux.  Since PR 94655 was reported against GCC
10 I'd like to apply this fix to both the trunk and the 10 branch.

Martin
PR middle-end/96963 - -Wstringop-overflow false positive with -ftree-vectorize when assigning consecutive char struct members

gcc/ChangeLog:

	PR middle-end/96963
	* builtins.c (compute_objsize_r): Correct a workaround for vectorized
	assignments.

gcc/testsuite/ChangeLog:

	PR middle-end/96963
	* gcc.dg/Wstringop-overflow-65.c: New test.
	* gcc.dg/Warray-bounds-69.c: Same.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 0aed008687c..2ffe472d4ea 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5425,24 +5425,33 @@ compute_objsize_r (tree ptr, int ostype, access_ref *pref,
   ++pref->deref;
 
   tree ref = TREE_OPERAND (ptr, 0);
-  tree reftype = TREE_TYPE (ref);
-  if (!addr && code == ARRAY_REF
-	  && TREE_CODE (TREE_TYPE (reftype)) == POINTER_TYPE)
-	/* Avoid arrays of pointers.  FIXME: Hande pointers to arrays
-	   of known bound.  */
-	return false;
+  {
+	tree reftype = TREE_TYPE (ref);
+	if (!addr && code == ARRAY_REF
+	&& TREE_CODE (TREE_TYPE (reftype)) == POINTER_TYPE)
+	  /* Avoid arrays of pointers.  FIXME: Hande pointers to arrays
+	 of known bound.  */
+	  return false;
+  }
+  {
+	tree ptrtype = TREE_TYPE (ptr);
+	if (POINTER_TYPE_P (ptrtype))
+	  ptrtype = TREE_TYPE (ptrtype);
 
-  if (code == MEM_REF && TREE_CODE (reftype) == POINTER_TYPE)
-	{
-	  /* Give up for MEM_REFs of vector types; those may be synthesized
-	 from multiple assignments to consecutive data members.  See PR
-	 93200.
-	 FIXME: Deal with this more generally, e.g., by marking up such
-	 MEM_REFs at the time they're created.  */
-	  reftype = TREE_TYPE (reftype);
-	  if (TREE_CODE (reftype) == VECTOR_TYPE)
+	if (code == MEM_REF && TREE_CODE (ptrtype) == VECTOR_TYPE)
+	  {
+	/* Hack: Give up for MEM_REFs of vector types; those may be
+	   synthesized from multiple assignments to consecutive data
+	   members (see PR 93200 and 96963).
+	   FIXME: Vectorized assignments should only be present after
+	   vectorization so this hack is only necessary after it has
+	   run and could be avoided in calls from prior passes (e.g.,
+	   tree-ssa-strlen.c).
+	   FIXME: Deal with this more generally, e.g., by marking up
+	   such MEM_REFs at the time they're created.  */
 	return false;
-	}
+	  }
+  }
 
   if (!compute_objsize_r (ref, ostype, pref, snlim, qry))
 	return false;
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-69.c b/gcc/testsuite/gcc.dg/Warray-bounds-69.c
new file mode 100644
index 000..5a955774124
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-69.c
@@ -0,0 +1,74 @@
+/* Verify that storing a bigger vector into smaller space is diagnosed.
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds" } */
+
+typedef __INT16_TYPE__ int16_t;
+typedef __attribute__ ((__vector_size__ (32))) char C32;
+
+typedef __attribute__ ((__vector_size__ (64))) int16_t I16_64;
+
+void sink (void*);
+
+
+void nowarn_c32 (char c)
+{
+  extern char nowarn_a32[32];
+
+  void *p = nowarn_a32;
+  *(C32*)p = (C32){ c };
+  sink (p);
+
+  char a32[32];
+  p = a32;
+  *(C32*)p = (C32){ c };
+  sink (p);
+}
+
+/* The invalid stores below are diagnosed by -Warray-bounds only
+   because it doesn't use compute_objsize().  If/when that changes
+   the function might need adjusting to avoid the hack put in place
+   to avoid false positives due to vectorization.  */
+
+void warn_c32 (char c)
+{
+  extern char warn_a32[32];   // { dg-message "'warn_a32'" "note" }
+
+  void *p = warn_a32 + 1;
+  *(C32*)p = (C32){ c };  // { dg-warning "\\\[-Warray-bounds" }
+
+  /* Verify a local variable too. */
+  char a32[32];   // { dg-message "'a32'" }
+  p = a32 + 1;
+  *(C32*)p = (C32){ c };  // { dg-warning "\\\[-Warray-bounds" }
+  sink (p);
+}
+
+
+void nowarn_i16_64 (int16_t i)
+{
+  extern char nowarn_a64[64];
+
+  void *p = nowarn_a64;
+  I16_64 *q = (I16_64*)p;
+  *q = (I16_64){ i };
+
+  char a64[64];
+  q = (I16_64*)a64;
+  *q = (I16_64){ i };

Re: [RS6000] Adjust testcases for power10 instructions V3

2021-01-21 Thread Alan Modra via Gcc-patches

Ping.

On Tue, Jan 12, 2021 at 02:03:18PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557587.html
> 
> On Fri, Oct 30, 2020 at 07:00:14PM +1030, Alan Modra wrote:
> > And now waking up to what you meant by the lvsl-lvsr.c \s comment,
> > plus a revised ppc-ne0-1.c scan-assembler.
> > 
> > I think this covers all previous review corrections.  Regression tested
> > powerpc64-linux power7 and powerpc64le-linux power10.  OK?
> > 
> > * lib/target-supports.exp (check_effective_target_has_arch_pwr10): New.
> > * gcc.dg/pr56727-2.c,
> > gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c,
> > gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c,
> > gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-char.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-double.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-float.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-int.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c,
> > gcc.target/powerpc/fold-vec-load-vec_xl-short.c,
> > gcc.target/powerpc/fold-vec-splat-floatdouble.c,
> > gcc.target/powerpc/fold-vec-splat-longlong.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c,
> > gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c,
> > gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-char.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-double.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-float.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-int.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c,
> > gcc.target/powerpc/fold-vec-store-vec_xst-short.c,
> > gcc.target/powerpc/lvsl-lvsr.c,
> > gcc.target/powerpc/ppc-eq0-1.c,
> > gcc.target/powerpc/ppc-ne0-1.c,
> > gcc.target/powerpc/pr86731-fwrapv-longlong.c: Match power10 insns.
> > * gcc.target/powerpc/lvsl-lvsr.c: Avoid file name match.
> > 
> > diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c 
> > b/gcc/testsuite/gcc.dg/pr56727-2.c
> > index c54369ed25e..f055116772a 100644
> > --- a/gcc/testsuite/gcc.dg/pr56727-2.c
> > +++ b/gcc/testsuite/gcc.dg/pr56727-2.c
> > @@ -18,4 +18,4 @@ void h ()
> >  
> >  /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* } 
> > } } */
> >  /* { dg-final { scan-assembler "@(PLT|plt)" { target { powerpc*-*-linux* 
> > && ilp32 } } } } */
> > -/* { dg-final { scan-assembler "bl f\n\\s*nop" { target { 
> > powerpc*-*-linux* && lp64 } } } } */
> > +/* { dg-final { scan-assembler {bl f(\n\s*nop|@notoc\n)} { target { 
> > powerpc*-*-linux* && lp64 } } } } */
> > diff --git 
> > a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c 
> > b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
> > index 246f38fa6d1..1cff4550f28 100644
> > --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
> > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
> > @@ -25,6 +25,6 @@ main1 (void)
> > with no word loads (lw, lwu, lwz, lwzu, or their indexed forms)
> > or word stores (stw, stwu, stwx, stwux, or their indexed forms).  */
> >  
> > -/* { dg-final { scan-assembler "\t(lvx|lxv|lvsr|stxv)" } } */
> > +/* { dg-final { scan-assembler "\t(lvx|p?lxv|lvsr|p?stxv)" } } */
> >  /* { dg-final { scan-assembler-not "\tlwz?u?x? " { xfail { 
> > powerpc-ibm-aix* } } } } */
> >  /* { dg-final { scan-assembler-not "\tstwu?x? " } } */
> > diff --git 
> > a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c 
> > b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
> > index 9b199c219bf..104710700c8 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-cha

Re: [PATCH 8/8] [RS6000] rs6000_rtx_costs for !speed

2021-01-21 Thread Alan Modra via Gcc-patches

Ping.

On Tue, Jan 12, 2021 at 02:02:36PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555759.html
> 
> On Thu, Oct 08, 2020 at 09:28:00AM +1030, Alan Modra wrote:
> > When optimizing for size we shouldn't be using metrics based on speed
> > or vice-versa.  rtlanal.c:get_full_rtx_cost wants both speed and size
> > metric from rs6000_rtx_costs independent of the global optimize_size.
> > 
> > Note that the patch changes param_simultaneous_prefetches,
> > param_l1_cache_size, param_l1_cache_line_size and param_l2_cache_size,
> > which were previously all set to zero for optimize_size.  I think that
> > was a bug.  Those params are a function of the processor.
> > 
> > * config/rs6000/rs6000.h (rs6000_cost): Don't declare.
> > (struct processor_costs): Move to..
> > * config/rs6000/rs6000.c: ..here.
> > (rs6000_cost): Make static.
> > (rs6000_option_override_internal): Ignore optimize_size when
> > setting up rs6000_cost.
> > (rs6000_insn_cost): Take into account optimize_size here
> > instead.
> > (rs6000_emit_parity): Likewise.
> > (rs6000_rtx_costs): Don't use rs6000_cost when !speed.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index d455aa52427..14ecbad5df4 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -497,7 +497,26 @@ rs6000_store_data_bypass_p (rtx_insn *out_insn, 
> > rtx_insn *in_insn)
> >  
> >  /* Processor costs (relative to an add) */
> >  
> > -const struct processor_costs *rs6000_cost;
> > +struct processor_costs {
> > +  const int mulsi;   /* cost of SImode multiplication.  */
> > +  const int mulsi_const;  /* cost of SImode multiplication by constant.  */
> > +  const int mulsi_const9; /* cost of SImode mult by short constant.  */
> > +  const int muldi;   /* cost of DImode multiplication.  */
> > +  const int divsi;   /* cost of SImode division.  */
> > +  const int divdi;   /* cost of DImode division.  */
> > +  const int fp;  /* cost of simple SFmode and DFmode insns.  */
> > +  const int dmul;/* cost of DFmode multiplication (and fmadd).  */
> > +  const int sdiv;/* cost of SFmode division (fdivs).  */
> > +  const int ddiv;/* cost of DFmode division (fdiv).  */
> > +  const int cache_line_size;/* cache line size in bytes. */
> > +  const int l1_cache_size; /* size of l1 cache, in kilobytes.  */
> > +  const int l2_cache_size; /* size of l2 cache, in kilobytes.  */
> > +  const int simultaneous_prefetches; /* number of parallel prefetch
> > +   operations.  */
> > +  const int sfdf_convert;  /* cost of SF->DF conversion.  */
> > +};
> > +
> > +static const struct processor_costs *rs6000_cost;
> >  
> >  /* Instruction size costs on 32bit processors.  */
> >  static const
> > @@ -4618,131 +4637,128 @@ rs6000_option_override_internal (bool 
> > global_init_p)
> >  }
> >  
> >/* Initialize rs6000_cost with the appropriate target costs.  */
> > -  if (optimize_size)
> > -rs6000_cost = TARGET_POWERPC64 ? &size64_cost : &size32_cost;
> > -  else
> > -switch (rs6000_tune)
> > -  {
> > -  case PROCESSOR_RS64A:
> > -   rs6000_cost = &rs64a_cost;
> > -   break;
> > +  switch (rs6000_tune)
> > +{
> > +case PROCESSOR_RS64A:
> > +  rs6000_cost = &rs64a_cost;
> > +  break;
> >  
> > -  case PROCESSOR_MPCCORE:
> > -   rs6000_cost = &mpccore_cost;
> > -   break;
> > +case PROCESSOR_MPCCORE:
> > +  rs6000_cost = &mpccore_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC403:
> > -   rs6000_cost = &ppc403_cost;
> > -   break;
> > +case PROCESSOR_PPC403:
> > +  rs6000_cost = &ppc403_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC405:
> > -   rs6000_cost = &ppc405_cost;
> > -   break;
> > +case PROCESSOR_PPC405:
> > +  rs6000_cost = &ppc405_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC440:
> > -   rs6000_cost = &ppc440_cost;
> > -   break;
> > +case PROCESSOR_PPC440:
> > +  rs6000_cost = &ppc440_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC476:
> > -   rs6000_cost = &ppc476_cost;
> > -   break;
> > +case PROCESSOR_PPC476:
> > +  rs6000_cost = &ppc476_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC601:
> > -   rs6000_cost = &ppc601_cost;
> > -   break;
> > +case PROCESSOR_PPC601:
> > +  rs6000_cost = &ppc601_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC603:
> > -   rs6000_cost = &ppc603_cost;
> > -   break;
> > +case PROCESSOR_PPC603:
> > +  rs6000_cost = &ppc603_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC604:
> > -   rs6000_cost = &ppc604_cost;
> > -   break;
> > +case PROCESSOR_PPC604:
> > +  rs6000_cost = &ppc604_cost;
> > +  break;
> >  
> > -  case PROCESSOR_PPC604e:
> > -   rs6000_cost = &ppc604e_cost;
> > -   break;
> > +case PROCESSOR_PPC604e:
> > +  rs6000_cost = &ppc6

Re: [PATCH 7/8] [RS6000] rs6000_rtx_costs reduce cost for SETs

2021-01-21 Thread Alan Modra via Gcc-patches

Ping.

On Tue, Jan 12, 2021 at 02:02:27PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555758.html
> 
> On Thu, Oct 08, 2020 at 09:27:59AM +1030, Alan Modra wrote:
> > The aim of this patch is to make rtx_costs for SETs closer to
> > insn_cost for SETs.  One visible effect on powerpc code is increased
> > if-conversion.
> > 
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost of SET
> > operands.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 76aedbfae6f..d455aa52427 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21684,6 +21684,35 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> > }
> >return false;
> >  
> > +case SET:
> > +  /* On entry the value in *TOTAL is the number of general purpose
> > +regs being set, multiplied by COSTS_N_INSNS (1).  Handle
> > +costing of set operands specially since in most cases we have
> > +an instruction rather than just a piece of RTL and should
> > +return a cost comparable to insn_cost.  That's a little
> > +complicated because in some cases the cost of SET operands is
> > +non-zero, see point 5 above and cost of PLUS for example, and
> > +in others it is zero, for example for (set (reg) (reg)).
> > +But (set (reg) (reg)) has the same insn_cost as
> > +(set (reg) (plus (reg) (reg))).  Hack around this by
> > +subtracting COSTS_N_INSNS (1) from the operand cost in cases
> > +were we add at least COSTS_N_INSNS (1) for some operation.
> > +However, don't do so for constants.  Constants might cost
> > +more than zero when they require more than one instruction,
> > +and we do want the cost of extra instructions.  */
> > +  {
> > +   rtx_code src_code = GET_CODE (SET_SRC (x));
> > +   if (src_code == CONST_INT
> > +   || src_code == CONST_DOUBLE
> > +   || src_code == CONST_WIDE_INT)
> > + return false;
> > +   int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > +   + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> > +   if (set_cost >= COSTS_N_INSNS (1))
> > + *total += set_cost - COSTS_N_INSNS (1);
> > +   return true;
> > +  }
> > +
> >  default:
> >return false;
> >  }

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH 5/8] [RS6000] rs6000_rtx_costs cost IOR

2021-01-21 Thread Alan Modra via Gcc-patches

Ping.

On Tue, Jan 12, 2021 at 02:02:18PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555756.html
> 
> On Thu, Oct 08, 2020 at 09:27:57AM +1030, Alan Modra wrote:
> > * config/rs6000/rs6000.c (rotate_insert_cost): New function.
> > (rs6000_rtx_costs): Cost IOR.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 383d2901c9f..15a806fe307 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21206,6 +21206,91 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
> >  && get_attr_cannot_copy (insn);
> >  }
> >  
> > +/* Handle rtx_costs for scalar integer rotate and insert insns.  */
> > +
> > +static bool
> > +rotate_insert_cost (rtx left, rtx right, machine_mode mode, bool speed,
> > +   int *total)
> > +{
> > +  if (GET_CODE (right) == AND
> > +  && CONST_INT_P (XEXP (right, 1))
> > +  && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
> > +{
> > +  rtx leftop = XEXP (left, 0);
> > +  rtx rightop = XEXP (right, 0);
> > +
> > +  /* rotlsi3_insert_5.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == SImode
> > + && UINTVAL (XEXP (left, 1)) != 0
> > + && UINTVAL (XEXP (right, 1)) != 0
> > + && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
> > +   return true;
> > +  /* rotldi3_insert_6.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == DImode
> > + && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
> > +   return true;
> > +  /* rotldi3_insert_7.  */
> > +  if (REG_P (leftop)
> > + && REG_P (rightop)
> > + && mode == DImode
> > + && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
> > +   return true;
> > +
> > +  rtx mask = 0;
> > +  rtx shift = leftop;
> > +  rtx_code shift_code = GET_CODE (shift);
> > +  /* rotl3_insert.  */
> > +  if (shift_code == ROTATE
> > + || shift_code == ASHIFT
> > + || shift_code == LSHIFTRT)
> > +   mask = right;
> > +  else
> > +   {
> > + shift = rightop;
> > + shift_code = GET_CODE (shift);
> > + /* rotl3_insert_2.  */
> > + if (shift_code == ROTATE
> > + || shift_code == ASHIFT
> > + || shift_code == LSHIFTRT)
> > +   mask = left;
> > +   }
> > +  if (mask
> > + && CONST_INT_P (XEXP (shift, 1))
> > + && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
> > +   {
> > + *total += rtx_cost (XEXP (shift, 0), mode, shift_code, 0, speed);
> > + *total += rtx_cost (XEXP (mask, 0), mode, AND, 0, speed);
> > + return true;
> > +   }
> > +}
> > +  /* rotl3_insert_3.  */
> > +  if (GET_CODE (right) == ASHIFT
> > +  && CONST_INT_P (XEXP (right, 1))
> > +  && (INTVAL (XEXP (right, 1))
> > + == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
> > +{
> > +  *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
> > +  *total += rtx_cost (XEXP (right, 0), mode, ASHIFT, 0, speed);
> > +  return true;
> > +}
> > +  /* rotl3_insert_4.  */
> > +  if (GET_CODE (right) == LSHIFTRT
> > +  && CONST_INT_P (XEXP (right, 1))
> > +  && mode == SImode
> > +  && (INTVAL (XEXP (right, 1))
> > + + exact_log2 (-UINTVAL (XEXP (left, 1 == 32)
> > +{
> > +  *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
> > +  *total += rtx_cost (XEXP (right, 0), mode, LSHIFTRT, 0, speed);
> > +  return true;
> > +}
> > +  return false;
> > +}
> > +
> >  /* Compute a (partial) cost for rtx X.  Return true if the complete
> > cost has been computed, and false if subexpressions should be
> > scanned.  In either case, *TOTAL contains the cost result.
> > @@ -21253,7 +21338,7 @@ static bool
> >  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >   int opno ATTRIBUTE_UNUSED, int *total, bool speed)
> >  {
> > -  rtx right;
> > +  rtx left, right;
> >int code = GET_CODE (x);
> >  
> >switch (code)
> > @@ -21435,7 +21520,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >right = XEXP (x, 1);
> >if (CONST_INT_P (right))
> > {
> > - rtx left = XEXP (x, 0);
> > + left = XEXP (x, 0);
> >   rtx_code left_code = GET_CODE (left);
> >  
> >   /* rotate-and-mask: 1 insn.  */
> > @@ -21452,9 +21537,16 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >return false;
> >  
> >  case IOR:
> > -  /* FIXME */
> >*total = COSTS_N_INSNS (1);
> > -  return true;
> > +  left = XEXP (x, 0);
> > +  if (GET_CODE (left) == AND
> > + && CONST_INT_P (XEXP (left, 1)))
> > +   {
> > + right = XEXP (x, 1);
> > + if (rotate_insert_cost (left, right, mode, speed, total))
> > +   return true;
> > +   }
> > +  return false;
> >  
> >  case CLZ:
> >  case XOR:

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH 4/8] [RS6000] rs6000_rtx_costs tidy break/return

2021-01-21 Thread Alan Modra via Gcc-patches

Ping.

On Tue, Jan 12, 2021 at 02:02:09PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555755.html
> 
> On Thu, Oct 08, 2020 at 09:27:56AM +1030, Alan Modra wrote:
> > Most cases use "return false" rather than breaking out of the switch.
> > Do so in all cases.
> > 
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Tidy break/return.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index bc5e51aa5ce..383d2901c9f 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21371,7 +21371,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> > *total = rs6000_cost->fp;
> >else
> > *total = rs6000_cost->dmul;
> > -  break;
> > +  return false;
> >  
> >  case DIV:
> >  case MOD:
> > @@ -21539,7 +21539,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >   *total = rs6000_cost->fp;
> >   return false;
> > }
> > -  break;
> > +  return false;
> >  
> >  case NE:
> >  case EQ:
> > @@ -21577,13 +21577,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >   *total = 0;
> >   return true;
> > }
> > -  break;
> > +  return false;
> >  
> >  default:
> > -  break;
> > +  return false;
> >  }
> > -
> > -  return false;
> >  }
> >  
> >  /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH 3/8] [RS6000] rs6000_rtx_costs tidy AND

2021-01-21 Thread Alan Modra via Gcc-patches

Ping.

On Tue, Jan 12, 2021 at 02:01:57PM +1030, Alan Modra wrote:
> Ping
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555754.html
> 
> On Thu, Oct 08, 2020 at 09:27:55AM +1030, Alan Modra wrote:
> > * config/rs6000/rs6000.c (rs6000_rtx_costs): Tidy AND code.
> > Don't avoid recursion on const_int shift count.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index e870ba0039a..bc5e51aa5ce 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -21253,6 +21253,7 @@ static bool
> >  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >   int opno ATTRIBUTE_UNUSED, int *total, bool speed)
> >  {
> > +  rtx right;
> >int code = GET_CODE (x);
> >  
> >switch (code)
> > @@ -21430,7 +21431,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >return false;
> >  
> >  case AND:
> > -  if (CONST_INT_P (XEXP (x, 1)))
> > +  *total = COSTS_N_INSNS (1);
> > +  right = XEXP (x, 1);
> > +  if (CONST_INT_P (right))
> > {
> >   rtx left = XEXP (x, 0);
> >   rtx_code left_code = GET_CODE (left);
> > @@ -21439,17 +21442,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >   if ((left_code == ROTATE
> >|| left_code == ASHIFT
> >|| left_code == LSHIFTRT)
> > - && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
> > + && rs6000_is_valid_shift_mask (right, left, mode))
> > {
> > - *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > - if (!CONST_INT_P (XEXP (left, 1)))
> > -   *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
> > speed);
> > - *total += COSTS_N_INSNS (1);
> > + *total += rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > + *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
> >   return true;
> > }
> > }
> > -
> > -  *total = COSTS_N_INSNS (1);
> >return false;
> >  
> >  case IOR:

-- 
Alan Modra
Australia Development Lab, IBM

[committed] LRA: patch fixing PR98777

2021-01-21 Thread Vladimir Makarov via Gcc-patches


The following patch fixes recently reported

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98777

The patch was successfully bootstrapped on x86-64.


[PR98777] LRA: Use preliminary created pseudo for in LRA elimination subpass

LRA did not extend ira_reg_equiv after generation of a pseudo in
eliminate_regs_in_insn which might results in LRA crash.  It is better not
to extend ira_reg_equiv but to use preliminary generated pseudo.  The
patch implements it.

gcc/ChangeLog:

	PR rtl-optimization/98777
	* lra-int.h (lra_pmode_pseudo): New extern.
	* lra.c (lra_pmode_pseudo): New global.
	(lra): Set it up.
	* lra-eliminations.c (eliminate_regs_in_insn): Use it.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/98777
	* gcc.target/riscv/pr98777.c: New.

diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c
index 5b9717574ed..c97f9ca4c68 100644
--- a/gcc/lra-eliminations.c
+++ b/gcc/lra-eliminations.c
@@ -1059,7 +1059,7 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, bool first_p,
 	  && REGNO (reg1) < FIRST_PSEUDO_REGISTER
 	  && REGNO (reg2) >= FIRST_PSEUDO_REGISTER
 	  && GET_MODE (reg1) == Pmode
-	  && !have_addptr3_insn (gen_reg_rtx (Pmode), reg1,
+	  && !have_addptr3_insn (lra_pmode_pseudo, reg1,
  XEXP (XEXP (SET_SRC (set), 0), 1)))
 	{
 	  XEXP (XEXP (SET_SRC (set), 0), 0) = op2;
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 1b8f7b6ae61..4dadccc79f4 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -324,6 +324,7 @@ extern lra_copy_t lra_get_copy (int);
 extern int lra_new_regno_start;
 extern int lra_constraint_new_regno_start;
 extern int lra_bad_spill_regno_start;
+extern rtx lra_pmode_pseudo;
 extern bitmap_head lra_inheritance_pseudos;
 extern bitmap_head lra_split_regs;
 extern bitmap_head lra_subreg_reload_pseudos;
diff --git a/gcc/lra.c b/gcc/lra.c
index aa49de6f154..5a4b6638913 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -2192,6 +2192,9 @@ int lra_constraint_new_regno_start;
it is possible.  */
 int lra_bad_spill_regno_start;
 
+/* A pseudo of Pmode.  */
+rtx lra_pmode_pseudo;
+
 /* Inheritance pseudo regnos before the new spill pass.	 */
 bitmap_head lra_inheritance_pseudos;
 
@@ -2255,6 +2258,7 @@ lra (FILE *f)
 
   lra_dump_file = f;
   lra_asm_error_p = false;
+  lra_pmode_pseudo = gen_reg_rtx (Pmode);
   
   timevar_push (TV_LRA);
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr98777.c b/gcc/testsuite/gcc.target/riscv/pr98777.c
new file mode 100644
index 000..ea2c2f9ca64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr98777.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrict-aliasing -O" } */
+
+typedef struct {
+  _Complex e;
+  _Complex f;
+  _Complex g;
+  _Complex h;
+  _Complex i;
+  _Complex j;
+  _Complex k;
+  _Complex l;
+  _Complex m;
+  _Complex n;
+  _Complex o;
+  _Complex p;
+} Scl16;
+
+Scl16 g1sScl16, g2sScl16, g3sScl16, g4sScl16, g5sScl16, g6sScl16, g7sScl16,
+g8sScl16, g9sScl16, g10sScl16, g11sScl16, g12sScl16, g13sScl16, g14sScl16,
+g15sScl16, g16sScl16;
+
+void testvaScl16();
+
+void
+testitScl16() {
+  testvaScl16(g10sScl16, g11sScl16, g12sScl16, g13sScl16, g14sScl16, g1sScl16,
+  g2sScl16, g3sScl16, g4sScl16, g5sScl16, g6sScl16, g7sScl16,
+  g8sScl16, g9sScl16, g10sScl16, g11sScl16, g12sScl16, g13sScl16,
+  g14sScl16, g15sScl16, g16sScl16);
+}

Re: [PATCH v6] Practical improvement to libgcc complex divide

2021-01-21 Thread Joseph Myers

On Fri, 18 Dec 2020, Patrick McGehearty via Gcc-patches wrote:

> TEST Data

I'd still like to see the test data / code used to produce the accuracy 
and performance results made available somewhere (presumably with a link 
then being provided in the commit message).

> +   if ((mode == TYPE_MODE (float_type_node))
> +   || (mode == TYPE_MODE (double_type_node))
> +   || (mode == TYPE_MODE (long_double_type_node)))
> + {
> +   char val_name[64];
> +   char fname[8] = "";
> +   if (mode == TYPE_MODE (float_type_node))
> + memcpy (fname, "FLT", 4);
> +   else if (mode == TYPE_MODE (double_type_node))
> + memcpy (fname, "DBL", 4);
> +   else if (mode == TYPE_MODE (long_double_type_node))
> + memcpy (fname, "LDBL", 5);

This logic being used to generate EPSILON, MAX and MIN macros only handles 
modes that match float, double or long double (so won't define the macros 
for a mode that only matches another type such as _Float128, for example).

Earlier in the same function, there is existing code to define 
__LIBGCC_%s_FUNC_EXT__.  That code has to do something similar, to 
determine the matching type for a mode - but it also handles _FloatN / 
_FloatNx types, and has an assertion at the end that some matching type 
was found.

Rather than having this code which handles a more limited set of types, I 
think the __LIBGCC_%s_FUNC_EXT__ code should be extended, so that as well 
as computing a function suffix it also computes a prefix such as FLT, DBL, 
FLT128 or FLT64X.  Then all supported floating-point modes can get these 
three LIBGCC macros defined, rather than just those matching float, double 
or long double.

>  #elif defined(L_mulxc3) || defined(L_divxc3)
>  # define MTYPE   XFtype
>  # define CTYPE   XCtype
>  # define MODExc
>  # define CEXT__LIBGCC_XF_FUNC_EXT__
>  # define NOTRUNC (!__LIBGCC_XF_EXCESS_PRECISION__)
> +# define RBIG((__LIBGCC_XF_MAX__)/2)
> +# define RMIN(__LIBGCC_XF_MIN__)
> +# define RMIN2   (__LIBGCC_DF_EPSILON__)
> +# define RMINSCAL (1/__LIBGCC_DF_EPSILON__)
> +# define RMAX2   ((RBIG)*(RMIN2))

I'd then expect __LIBGCC_XF_EPSILON__ to be used for XFmode in place of 
__LIBGCC_DF_EPSILON__ unless there is some good reason to use the latter 
(which would need a comment to explain it if so).

>  #elif defined(L_multc3) || defined(L_divtc3)
>  # define MTYPE   TFtype
>  # define CTYPE   TCtype
>  # define MODEtc
>  # define CEXT__LIBGCC_TF_FUNC_EXT__
>  # define NOTRUNC (!__LIBGCC_TF_EXCESS_PRECISION__)
> +#if defined(__LIBGCC_TF_MIN__)
> +# define RBIG((__LIBGCC_TF_MAX__)/2)
> +# define RMIN(__LIBGCC_TF_MIN__)
> +#else
> +# define RBIG((__LIBGCC_XF_MAX__)/2)
> +# define RMIN(__LIBGCC_XF_MIN__)
> +#endif
> +# define RMIN2   (__LIBGCC_DF_EPSILON__)
> +# define RMINSCAL (1/__LIBGCC_DF_EPSILON__)

And, likewise, with the suggested changes to c-cppbuiltin.c this code can 
use __LIBGCC_TF_MAX__, __LIBGCC_TF_MIN__ and __LIBGCC_TF_EPSILON__ 
unconditionally, without ever needing to use XF or DF macros.  (If you 
want to use a different EPSILON value in the case where TFmode is IBM long 
double because of LDBL_EPSILON being too small in that case, condition 
that on __LIBGCC_TF_MANT_DIG__ == 106, and use ((TFtype) 0x1p-106) in 
place of __LIBGCC_TF_EPSILON__ in that case.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v3] c++: ICE with delayed noexcept and attribute used [PR97966]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/21/21 10:39 AM, Marek Polacek wrote:

On Thu, Jan 21, 2021 at 01:55:24AM -0500, Jason Merrill wrote:

+  /* Now that we've gone through all the members, instantiate those
+ marked with attribute used.  */
+  for (tree &x : used)


This doesn't need to be a reference.  And I think we want this to happen
even later, after finish_struct_1.


Fair enough, here's an updated version.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Another ICE with delayed noexcept parsing, but a bit gnarlier.

A function definition marked with __attribute__((used)) ought to be
emitted even when it is not referenced in the TU.  For a member function
template marked with __attribute__((used)) this means that it will
be instantiated: in instantiate_class_template_1 we have

11971   /* Instantiate members marked with attribute used.  */
11972   if (r != error_mark_node && DECL_PRESERVE_P (r))
11973 mark_used (r);

It is not so surprising that this doesn't work well with delayed
noexcept parsing: when we're processing the function template we delay
the parsing, so the member "foo" is found, but then when we're
instantiating it, "foo" hasn't yet been seen, which creates a
discrepancy and a crash ensues.  "foo" hasn't yet been seen because
instantiate_class_template_1 just loops over the class members and
instantiates right away.

To make it work, this patch uses a vector to keep track of members
marked with attribute used and uses it to instantiate such members
only after we're done with the class; in particular, after we have
called finish_member_declaration for each member.  And we ought to
be verifying that we did emit such members, so I've added a bunch
of dg-finals.

gcc/cp/ChangeLog:

PR c++/97966
* pt.c (instantiate_class_template_1): Instantiate members
marked with attribute used only after we're done instantiating
the class.

gcc/testsuite/ChangeLog:

PR c++/97966
* g++.dg/cpp0x/noexcept63.C: New test.
---
  gcc/cp/pt.c | 12 -
  gcc/testsuite/g++.dg/cpp0x/noexcept63.C | 63 +
  2 files changed, 73 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept63.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 373f8279604..1f3850d1048 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11895,6 +11895,9 @@ instantiate_class_template_1 (tree type)
   relative to the scope of the class.  */
pop_to_parent_deferring_access_checks ();
  
+  /* A vector to hold members marked with attribute used. */

+  auto_vec used;
+
/* Now members are processed in the order of declaration.  */
for (member = CLASSTYPE_DECL_LIST (pattern);
 member; member = TREE_CHAIN (member))
@@ -11968,7 +11971,7 @@ instantiate_class_template_1 (tree type)
  finish_member_declaration (r);
  /* Instantiate members marked with attribute used.  */
  if (r != error_mark_node && DECL_PRESERVE_P (r))
-   mark_used (r);
+   used.safe_push (r);
  if (TREE_CODE (r) == FUNCTION_DECL
  && DECL_OMP_DECLARE_REDUCTION_P (r))
cp_check_omp_declare_reduction (r);
@@ -12034,7 +12037,7 @@ instantiate_class_template_1 (tree type)
 /*flags=*/0);
  /* Instantiate members marked with attribute used. */
  if (r != error_mark_node && DECL_PRESERVE_P (r))
-   mark_used (r);
+   used.safe_push (r);
}
  else if (TREE_CODE (r) == FIELD_DECL)
{
@@ -12203,6 +12206,11 @@ instantiate_class_template_1 (tree type)
finish_struct_1 (type);
TYPE_BEING_DEFINED (type) = 0;
  
+  /* Now that we've gone through all the members, instantiate those

+ marked with attribute used.  */
+  for (tree x : used)
+mark_used (x);


Even farther down, I think, since access checks are part of the class 
instantiation, and I don't think doing it before all the popping gains 
us anything (though I'd be interested in knowing if this is wrong). 
Let's move it just before the return.  OK with that change.



/* We don't instantiate default arguments for member functions.  14.7.1:
  
   The implicit instantiation of a class template specialization causes

diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept63.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept63.C
new file mode 100644
index 000..cf048f56c2a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept63.C
@@ -0,0 +1,63 @@
+// PR c++/97966
+// { dg-do compile { target c++11 } }
+
+template 
+struct S1 {
+  __attribute__((used)) S1() noexcept(noexcept(this->foo())) { }
+  void foo();
+};
+
+template 
+struct S2 {
+  __attribute__((used)) void bar() noexcept(noexcept(this->foo())) { }
+  void foo();
+};
+
+template 
+struct S3 {
+  void __attri

Re: [PATCH] c++: private inheritance access diagnostics fix [PR17314]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/21/21 2:28 PM, Anthony Sharp wrote:

Hi Jason,

I've finally completed my copyright assignment form. I've attached it
to this email for reference.


You don't need write access to the main repository to use these commands
on your local copy.  One nice thing about git compared to svn is that
you don't need to touch the server for anything but push and pull.

Incidentally, how are you producing your patch?  Maybe try git
format-patch instead.


The method I am using at the moment is the one Ranjit Mathew talks
about here: http://rmathew.com/articles/gcj/crpatch.html.


Interesting.  Yeah, that page is obsolete since the move to git.


It's my fault kind of - the official GCC webpage
(https://gcc.gnu.org/gitwrite.html) explaining how to do it is called
'Read-write Git access' so I assumed it was only relevant for people
who have access to the repo, but I see that is not the case.


Those pages could definitely be more clearly organized.


I've tried the git way of doing it and I'm attaching a new patch file
that (hopefully) is better this time. Basically what I did was what
you suggested:

git pull
contrib/gcc-git-customization.sh
(make changes)
git add *
git gcc-commit-mklog
git gcc-commit-mklog --amend


Why two gcc-comit-mklog?  That would generate the log entries twice.

You should also git gcc-verify at this point; for me, it complains about 
some of your header lines in the log.  Your author line needs to start 
at the first column, and use "01" for January instead of just "1".  The 
other explanatory lines can be omitted, in favor of:


The commit message before the log entries should include your rationale 
for the patch (e.g. the first two paragraphs of your initial email).



git format-patch -1 master

I also re-built the source just to make sure I hadn't messed anything
up. I re-ran the C++ regression tests using make check-c and make
check-c++. Whilst I did not do a before/after comparison of the
results, I checked the FAILs in gcc.sum and g++.sum and they all
looked like they had nothing to do with my code. All the code is the
same as before, so I'm thinking it should be fine (I just wanted to be
safe). Also checked against check_GNU_style.sh.

Assuming that's all fine, as for the code itself, there might well be
some tweaks that could make it better, and so if that is the case then
please let me know.


I'll look at the code soon.

Thanks,
Jason

Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]

2021-01-21 Thread Kwok Cheung Yeung


On 21/01/2021 7:33 pm, Kwok Cheung Yeung wrote:
With Nvidia and GCN offloading though, task-detach-6 hangs... I _think_ the 
reason why it 'worked' before was because the taskwait allowed tasks with detach 
clauses to always complete immediately after execution. Since that backdoor has 
been closed, task-detach-6 hangs with or without the taskwait.


It turns out that the hang is because the team barrier threads fail to wake up 
when gomp_team_barrier_wake is called from omp_fulfill_event, because it was 
done while task_lock was held. When the lock is freed first, the wake works as 
expected and the test completes.


Is this patch okay for trunk (to be squashed into the previous patch)?

Thanks

Kwok
From 2ee183c22772bc7d80d24ae75d5bd57f419712fd Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Thu, 21 Jan 2021 14:01:16 -0800
Subject: [PATCH] openmp: Fix hangs when task constructs with detach clauses
 are offloaded

2021-01-21  Kwok Cheung Yeung  

libgomp/
task.c (GOMP_task): Add thread to debug message.
(gomp_barrier_handle_tasks): Do not take address of child_task in
debug message.
(omp_fulfill_event): Release team->task_lock before waking team
barrier threads.
---
 libgomp/task.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libgomp/task.c b/libgomp/task.c
index dbd6284..60b598e 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -492,7 +492,7 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
(void *, void *),
  if (data)
*(void **) data = task;
 
- gomp_debug (0, "New event: %p\n", task);
+ gomp_debug (0, "Thread %d: new event: %p\n", thr->ts.team_id, task);
}
   thr->task = task;
   if (cpyfn)
@@ -1372,7 +1372,7 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
 child_task, MEMMODEL_RELAXED);
  --team->task_detach_count;
  gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
- thr->ts.team_id, &child_task);
+ thr->ts.team_id, child_task);
 
  if (to_free)
{
@@ -2470,8 +2470,12 @@ omp_fulfill_event (omp_event_handle_t event)
   gomp_sem_post (&task->taskgroup->taskgroup_sem);
 }
   if (team && team->nthreads > team->task_running_count)
-gomp_team_barrier_wake (&team->barrier, 1);
-  gomp_mutex_unlock (&team->task_lock);
+{
+  gomp_mutex_unlock (&team->task_lock);
+  gomp_team_barrier_wake (&team->barrier, 1);
+}
+  else
+gomp_mutex_unlock (&team->task_lock);
 }
 
 ialias (omp_fulfill_event)
-- 
2.8.1

[PATCH] c++: ICE with noexcept in class in member function [PR96623]

2021-01-21 Thread Marek Polacek via Gcc-patches

I discovered very strange code in inject_parm_decls:

   if (args && is_this_parameter (args))
 {
   gcc_checking_assert (current_class_ptr == NULL_TREE);
   current_class_ptr = NULL_TREE;

We are tripping up on the assert because when we call inject_parm_decls,
current_class_ptr is set to 'A'.  It was set by inject_this_parameter
after we've parsed the parameter-declaration-clause of the member
function foo.  It seems correct to set ccp/ccr to A::B when we're
late parsing the noexcept-specifiers of bar* functions in B, so that
this-> does the right thing.  Since inject_parm_decls can mess with
ccp/ccr, I think best if we properly restore it after the late parsing
of noexcept-specifiers.

It should also work to clear ccp before calling inject_parm_decls, and
removing the assignment following the assert, should the assert stay.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/96623
* parser.c (inject_parm_decls): Remove a gcc_checking_assert.
(cp_parser_class_specifier_1): Restore current_class_{ptr,ref}
after late parsing of noexcept-specifiers.

gcc/testsuite/ChangeLog:

PR c++/96623
* g++.dg/cpp0x/noexcept64.C: New test.
---
 gcc/cp/parser.c |  8 
 gcc/testsuite/g++.dg/cpp0x/noexcept64.C | 24 
 2 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept64.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4b2bca3fd11..8e86e92e273 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -24709,7 +24709,6 @@ inject_parm_decls (tree decl)
 
   if (args && is_this_parameter (args))
 {
-  gcc_checking_assert (current_class_ptr == NULL_TREE);
   current_class_ptr = NULL_TREE;
   current_class_ref = cp_build_fold_indirect_ref (args);
   current_class_ptr = args;
@@ -24967,7 +24966,6 @@ cp_parser_class_specifier_1 (cp_parser* parser)
   tree pushed_scope = NULL_TREE;
   unsigned ix;
   cp_default_arg_entry *e;
-  tree save_ccp, save_ccr;
 
   if (!type_definition_ok_p || any_erroneous_template_args_p (type))
{
@@ -25012,6 +25010,8 @@ cp_parser_class_specifier_1 (cp_parser* parser)
   /* If there are noexcept-specifiers that have not yet been processed,
 take care of them now.  Do this before processing NSDMIs as they
 may depend on noexcept-specifiers already having been processed.  */
+  tree save_ccp = current_class_ptr;
+  tree save_ccr = current_class_ref;
   FOR_EACH_VEC_SAFE_ELT (unparsed_noexcepts, ix, decl)
{
  tree ctx = DECL_CONTEXT (decl);
@@ -25063,10 +25063,10 @@ cp_parser_class_specifier_1 (cp_parser* parser)
  maybe_end_member_template_processing ();
}
   vec_safe_truncate (unparsed_noexcepts, 0);
+  current_class_ptr = save_ccp;
+  current_class_ref = save_ccr;
 
   /* Now parse any NSDMIs.  */
-  save_ccp = current_class_ptr;
-  save_ccr = current_class_ref;
   FOR_EACH_VEC_SAFE_ELT (unparsed_nsdmis, ix, decl)
{
  if (class_type != DECL_CONTEXT (decl))
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept64.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept64.C
new file mode 100644
index 000..8b7303cd8a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept64.C
@@ -0,0 +1,24 @@
+// PR c++/96623
+// { dg-do compile { target c++11 } }
+
+constexpr int x = 0;
+struct A {
+  int a1;
+  void foo (int p) {
+int foovar;
+struct B {
+  int b1;
+  void bar1 () noexcept(x);
+  void bar2 () noexcept(noexcept(this->b1));
+  void bar3 () noexcept(noexcept(this->b2));
+  void bar4 () noexcept(noexcept(a1));
+  void bar5 () noexcept(noexcept(a2));
+  void bar6 () noexcept(noexcept(b1));
+  void bar7 () noexcept(noexcept(b2));
+  void bar8 () noexcept(noexcept(foovar));
+  void bar9 () noexcept(noexcept(p));
+  int b2;
+};
+  }
+  int a2;
+};

base-commit: f645da0e4ab9438dfd0c047c710c7ec6a7d6d8f3
-- 
2.29.2

Re: skip asan-poisoning of discarded vars

2021-01-21 Thread Alexandre Oliva

On Jan 21, 2021, Alexandre Oliva  wrote:

> But I was wrong.  The bootstrap with the added assert has just failed,
> as early as stage2 libiberty.  Looking into it...

Uhh, I take that back.  I just goofed in the assert, inverting the
condition.  Long day...

With the correct condition, it's got past the stage2 compilation of all
of the gcc deps and Ada sources, and then some.  Maybe my reasoning
wasn't wrong, after all ;-)

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [PATCH v2] c++: ICE when mangling operator name [PR98545]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/21/21 2:44 PM, Marek Polacek wrote:

On Tue, Jan 19, 2021 at 05:38:20PM -0500, Marek Polacek via Gcc-patches wrote:

On Tue, Jan 19, 2021 at 03:47:47PM -0500, Jason Merrill via Gcc-patches wrote:

On 1/13/21 6:39 PM, Marek Polacek wrote:

r11-6301 added some asserts in mangle.c, and now we trip over one of
them.  In particular, it's the one asserting that we didn't get
IDENTIFIER_ANY_OP_P when mangling an expression with a dependent name.

As this testcase shows, it's possible to get that, so turn the assert
into an if and write "on".  That changes the mangling in the following
way:

With this patch:

$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTonclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

G++10:
$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

clang++/icc:
$ c++filt _ZN1i1hIJ1adS1_EEEDTclonclspcvT__EEEDpS2_
decltype ((operator())((a)(), (double)(), (a)())) i::h(a, double, 
a)

I'm not sure why we differ in the "(*this)." part


Is there a PR for that?


I just opened 98756, because I didn't find any.  I can investigate where that
(*this) comes from, though it's not readily clear to me if this is a bug or not.


but at least the
suffix "onclspcvT__EEEDpS2_" is the same for all three compilers.  So
I hope the following fix makes sense.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/98545
* mangle.c (write_expression): When the expression is a dependent name
and an operator name, write "on" before writing its name.

gcc/testsuite/ChangeLog:

PR c++/98545
* g++.dg/abi/mangle76.C: New test.
---
   gcc/cp/mangle.c |  3 ++-
   gcc/testsuite/g++.dg/abi/mangle76.C | 39 +
   2 files changed, 41 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/abi/mangle76.C

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 11eb8962d28..bb3c4b76d33 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3349,7 +3349,8 @@ write_expression (tree expr)
 else if (dependent_name (expr))
   {
 tree name = dependent_name (expr);
-  gcc_assert (!IDENTIFIER_ANY_OP_P (name));
+  if (IDENTIFIER_ANY_OP_P (name))
+   write_string ("on");


Any mangling change needs to handle different -fabi-versions; see the
similar code in write_member_name.


Ah, I only looked at the unguarded IDENTIFIER_ANY_OP_P checks.  But now
I have a possibly stupid question: what version should I check?  We have
Version 11 for which the manual already says "corrects the mangling of
sizeof... expressions and *operator names*", so perhaps I could tag along
and check abi_version_at_least (11).  Or should I check Version 15 and
update the manual?


The latter seems to be true, therefore a new patch is attached.
   

And why doesn't this go through write_member_name?


We go through write_member_name:

#0  fancy_abort (file=0x2b98ef8 "/home/mpolacek/src/gcc/gcc/cp/mangle.c", 
line=3352,
 function=0x2b99751 "write_expression") at 
/home/mpolacek/src/gcc/gcc/diagnostic.c:1884
#1  0x00bee91b in write_expression (expr=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3352
#2  0x00beb3e2 in write_member_name (member=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:2892
#3  0x00beee70 in write_expression (expr=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3405
#4  0x00bef1be in write_expression (expr=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3455
#5  0x00be858a in write_type (type=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:2343

so in write_member_name MEMBER is a BASELINK so we don't enter the
identifier_p block.


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
r11-6301 added some asserts in mangle.c, and now we trip over one of
them.  In particular, it's the one asserting that we didn't get
IDENTIFIER_ANY_OP_P when mangling an expression with a dependent name.

As this testcase shows, it's possible to get that, so turn the assert
into an if and write "on".  That changes the mangling in the following
way:

With this patch:

$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTonclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

G++10:
$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

clang++/icc:
$ c++filt _ZN1i1hIJ1adS1_EEEDTclonclspcvT__EEEDpS2_
decltype ((operator())((a)(), (double)(), (a)())) i::h(a, double, 
a)

This is now tracked in PR98756.

gcc/cp/ChangeLog:

PR c++/98545
* mangle.c (write_expression): When the expression is a dependent name
and an operator name, write "on" before writing its name.

gcc/ChangeLog:

PR c++/98545
* doc/invoke.texi: Update C++ ABI Version 15 description.

gcc/testsuite/ChangeLog:

Re: [PATCH] c++: ICE when mangling operator name [PR98545]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/19/21 5:38 PM, Marek Polacek wrote:

On Tue, Jan 19, 2021 at 03:47:47PM -0500, Jason Merrill via Gcc-patches wrote:

On 1/13/21 6:39 PM, Marek Polacek wrote:

r11-6301 added some asserts in mangle.c, and now we trip over one of
them.  In particular, it's the one asserting that we didn't get
IDENTIFIER_ANY_OP_P when mangling an expression with a dependent name.

As this testcase shows, it's possible to get that, so turn the assert
into an if and write "on".  That changes the mangling in the following
way:

With this patch:

$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTonclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

G++10:
$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

clang++/icc:
$ c++filt _ZN1i1hIJ1adS1_EEEDTclonclspcvT__EEEDpS2_
decltype ((operator())((a)(), (double)(), (a)())) i::h(a, double, 
a)

I'm not sure why we differ in the "(*this)." part


Is there a PR for that?


I just opened 98756, because I didn't find any.  I can investigate where that
(*this) comes from, though it's not readily clear to me if this is a bug or not.


I think it is; in general, the mangling tries to be close to the 
expression as written.  We're talking about adjusting that a bit to 
reflect the result of name lookup more, but that wouldn't make a 
difference to this case.



but at least the
suffix "onclspcvT__EEEDpS2_" is the same for all three compilers.  So
I hope the following fix makes sense.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/98545
* mangle.c (write_expression): When the expression is a dependent name
and an operator name, write "on" before writing its name.

gcc/testsuite/ChangeLog:

PR c++/98545
* g++.dg/abi/mangle76.C: New test.
---
   gcc/cp/mangle.c |  3 ++-
   gcc/testsuite/g++.dg/abi/mangle76.C | 39 +
   2 files changed, 41 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/abi/mangle76.C

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 11eb8962d28..bb3c4b76d33 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3349,7 +3349,8 @@ write_expression (tree expr)
 else if (dependent_name (expr))
   {
 tree name = dependent_name (expr);
-  gcc_assert (!IDENTIFIER_ANY_OP_P (name));
+  if (IDENTIFIER_ANY_OP_P (name))
+   write_string ("on");


Any mangling change needs to handle different -fabi-versions; see the
similar code in write_member_name.


Ah, I only looked at the unguarded IDENTIFIER_ANY_OP_P checks.  But now
I have a possibly stupid question: what version should I check?  We have
Version 11 for which the manual already says "corrects the mangling of
sizeof... expressions and *operator names*", so perhaps I could tag along
and check abi_version_at_least (11).  Or should I check Version 15 and
update the manual?


As discussed on our call today, we don't want to change the behavior of 
older versions, so version 15 is the answer.



And why doesn't this go through write_member_name?


We go through write_member_name:

#0  fancy_abort (file=0x2b98ef8 "/home/mpolacek/src/gcc/gcc/cp/mangle.c", 
line=3352,
 function=0x2b99751 "write_expression") at 
/home/mpolacek/src/gcc/gcc/diagnostic.c:1884
#1  0x00bee91b in write_expression (expr=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3352
#2  0x00beb3e2 in write_member_name (member=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:2892
#3  0x00beee70 in write_expression (expr=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3405
#4  0x00bef1be in write_expression (expr=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3455
#5  0x00be858a in write_type (type=)
 at /home/mpolacek/src/gcc/gcc/cp/mangle.c:2343

so in write_member_name MEMBER is a BASELINK so we don't enter the
identifier_p block.


Aha.

Jason

Re: [PATCH] c++: Suppress this injection for static member functions [PR97399]

2021-01-21 Thread Patrick Palka via Gcc-patches

On Thu, 21 Jan 2021, Marek Polacek wrote:

> On Thu, Jan 21, 2021 at 11:22:24AM -0500, Patrick Palka via Gcc-patches wrote:
> > Here at parse time finish_qualified_id_expr adds an implicit 'this->' to
> > the expression tmp::integral (because it's type-dependent, and also
> > current_class_ptr is set) within the trailing return type, and later
> > during substitution we can't resolve the 'this' since
> > tsubst_function_type does inject_this_parm only for non-static member
> > functions which tmp::func is not.
> > 
> > It seems the root of the problem is that we set current_class_ptr when
> > parsing the signature of a static member function.  Since explicit uses
> > of 'this' are already not allowed in this context, we probably shouldn't
> > be doing inject_this_parm either.
> > 
> > Bootstrapped and regtested on x64_64-pc-linux-gnu, does this look OK for
> > trunk?
> 
> This looks fine to me.

Thanks.  It occurred to me that we should similarly avoid doing
inject_this_parm for friend functions, too.  Here's a patch to that end,
and which also addresses a minor diagnostic regression in a gomp
testcase that I didn't notice earlier:

-- >8 --

Subject: [PATCH] c++: Suppress 'this' injection for static member functions
 [PR97399]

In the testcase pr97399a.C below, finish_qualified_id_expr at parse time
adds an implicit 'this->' to the expression tmp::integral (because
it's type-dependent, and also current_class_ptr is set at this point)
within the trailing return type.  Later when substituting into the
trailing return type we ICE due to the unresolved 'this', since
tsubst_function_type does inject_this_parm only for non-static member
functions, which tmp::func is not.

It seems the root of the problem is that we set current_class_ptr when
parsing the signature of a static member function.  Since explicit uses
of 'this' are already forbidden in this context, we probably shouldn't
be doing inject_this_parm either.  Likewise for friend functions, with
which we can exhibit the same ICE.

Testing revealed a diagnostic regression in the gomp testcase this-1.C
due to current_class_ptr now being unset when parsing a use of 'this'
within the pragma for a static member function.  The below changes to
cp_parser_omp_var_list_no_open and finish_this_expr try to salvage the
quality of the affected error message (or else the error messages in
question would degenerate to "expected qualified-id before 'this'").

The change to cp_parser_init_declarator below is needed so that we
properly communicate static-ness to cp_parser_direct_declarator when
parsing a member function template.

Bootstrap and regtest re-running on x86_64-pc-linux-gnu, does this look
OK for trunk if successful?

gcc/cp/ChangeLog:

PR c++/97399
* parser.c (cp_parser_init_declarator): If the storage class
specifier is sc_static, pass true for static_p to
cp_parser_declarator.
(cp_parser_direct_declarator): Don't do inject_this_parm when
the function is specified as static or friend.
(cp_parser_omp_var_list_no_open): Call finish_this_expr even
when current_class_ptr is not set for better diagnostics.
* semantics.c (finish_this_expr): Emit a more specific diagnostic
when at class scope.

gcc/testsuite/ChangeLog:

PR c++/88548
PR c++/97399
* g++.dg/cpp0x/this2.C: New test.
* g++.dg/gomp/this-1.C: Adjust expected error for use of 'this'
in signature of static member function.
* g++.dg/template/pr97399a.C: New test.
* g++.dg/template/pr97399b.C: New test.
---
 gcc/cp/parser.c  |  6 +++---
 gcc/cp/semantics.c   |  2 ++
 gcc/testsuite/g++.dg/cpp0x/this2.C   |  9 +
 gcc/testsuite/g++.dg/gomp/this-1.C   |  4 ++--
 gcc/testsuite/g++.dg/template/pr97399a.C | 20 
 gcc/testsuite/g++.dg/template/pr97399b.C | 20 
 6 files changed, 56 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/this2.C
 create mode 100644 gcc/testsuite/g++.dg/template/pr97399a.C
 create mode 100644 gcc/testsuite/g++.dg/template/pr97399b.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 48437f23175..6d6bd1e60fd 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -21413,6 +21413,7 @@ cp_parser_init_declarator (cp_parser* parser,
   bool is_non_constant_init;
   int ctor_dtor_or_conv_p;
   bool friend_p = cp_parser_friend_p (decl_specifiers);
+  bool static_p = decl_specifiers->storage_class == sc_static;
   tree pushed_scope = NULL_TREE;
   bool range_for_decl_p = false;
   bool saved_default_arg_ok_p = parser->default_arg_ok_p;
@@ -21446,7 +21447,7 @@ cp_parser_init_declarator (cp_parser* parser,
 = cp_parser_declarator (parser, CP_PARSER_DECLARATOR_NAMED,
flags, &ctor_dtor_or_conv_p,
/*parenthesized_p=*/NULL,
-   member_p, friend_p, /*static_p=*/false)

Re: skip asan-poisoning of discarded vars

2021-01-21 Thread Alexandre Oliva

On Jan 21, 2021, Jakub Jelinek  wrote:

> Does that affect only Ada and not other languages?

I haven't observed it on other languages, but I didn't try really hard.
Doing that now, with an assert that the newly-added condition doesn't
ever hit.  I'd already completed a bootstrap-asan the other day, but not
with all languages.

The kind of variable that triggers the problem is created within
gcc/ada/gcc-interface/trans.c:Call_to_gnu, specifically within
create_temporary in the same file.  In the provided testcase, it injects
the temporary created for the return value from N, passed as a parameter
to C.  The binding block containing that temporary ends up dropped when
the initializer of V is discarded, because it's dynamic and V is
imported from a different unit.


I figured if the added condition were to ever hit before, we would add a
poison call to a function that did not have gimple_add_tmp_var called on
it, and that would NOT have it called in the block right after the one I
proposed to modify:

  if (!DECL_SEEN_IN_BIND_EXPR_P (decl)
  && DECL_ARTIFICIAL (decl) && DECL_NAME (decl) == NULL_TREE)
gimple_add_tmp_var (decl);

Without gimple_add_tmp_var, we wouldn't allocate automatic storage to
the variable in expand, and then the attempt to take its address for the
poisoning call would explode in make_decl_rtl like this testcase does,
because make_decl_rtl is not to be called for automatic variables.

Since this didn't happen, I figured the new condition would not be hit
except for the failing case.  But I was wrong.  The bootstrap with the
added assert has just failed, as early as stage2 libiberty.  Looking
into it...

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

[PATCH] document BLOCK_ABSTRACT_ORIGIN et al.

2021-01-21 Thread Martin Sebor via Gcc-patches


On 1/18/21 6:25 AM, Richard Biener wrote:

PS Here are my notes on the macros and the two related functions:

BLOCK: Denotes a lexical scope.  Contains BLOCK_VARS of variables
declared in it, BLOCK_SUBBLOCKS of scopes nested in it, and
BLOCK_CHAIN pointing to the next BLOCK.  Its BLOCK_SUPERCONTEXT
point to the BLOCK of the enclosing scope.  May have
a BLOCK_ABSTRACT_ORIGIN and a BLOCK_SOURCE_LOCATION.

BLOCK_SUPERCONTEXT: The scope of the enclosing block, or FUNCTION_DECL
for the "outermost" function scope.  Inlined functions are chained by
this so that given expression E and its TREE_BLOCK(E) B,
BLOCK_SUPERCONTEXT(B) is the scope (BLOCK) in which E has been made
or into which E has been inlined.  In the latter case,

BLOCK_ORIGIN(B) evaluates either to the enclosing BLOCK or to
the enclosing function DECL.  It's never null.

BLOCK_ABSTRACT_ORIGIN(B) is the FUNCTION_DECL of the function into
which it has been inlined, or null if B is not inlined.


It's the BLOCK or FUNCTION it was inlined _from_, not were it was inlined to.
It's the "ultimate" source, thus the abstract copy of the block or function decl
(for the outermost scope, aka inlined_function_outer_scope_p).  It corresponds
to what you'd expect for the DWARF abstract origin.


Thanks for the correction!  It's just the "innermost" block that
points to the "ultimate" destination into which it's been inlined.



BLOCK_ABSTRACT_ORIGIN can be NULL (in case it isn't an inline instance).


BLOCK_ABSTRACT_ORIGIN: A BLOCK, or FUNCTION_DECL of the function
into which a block has been inlined.  In a BLOCK immediately enclosing
an inlined leaf expression points to the outermost BLOCK into which it
has been inlined (thus bypassing all intermediate BLOCK_SUPERCONTEXTs).

BLOCK_FRAGMENT_ORIGIN: ???
BLOCK_FRAGMENT_CHAIN: ???


that's for scope blocks split by hot/cold partitioning and only temporarily
populated.


Thanks, I now see these documented in detail in tree.h.




bool inlined_function_outer_scope_p(BLOCK)   [tree.h]
Returns true if a BLOCK has a source location.
True for all but the innermost (no SUBBLOCKs?) and outermost blocks
into which an expression has been inlined. (Is this always true?)

tree block_ultimate_origin(BLOCK)   [tree.c]
Returns BLOCK_ABSTRACT_ORIGIN(BLOCK), AO, after asserting that
(DECL_P(AO) && DECL_ORIGIN(AO) == AO) || BLOCK_ORIGIN(AO) == AO).


The attached diff adds the comments above to tree.h.

I looked for a good place in the manual to add the same text but I'm
not sure.  Would the Blocks @subsection in generic.texi be appropriate?

Martin
diff --git a/gcc/tree.h b/gcc/tree.h
index 02b03d1f68e..0dd2196008b 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1912,18 +1912,29 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_OPERAND(NODE, I)\
 	OMP_CLAUSE_ELT_CHECK (NODE, I)
 
-/* In a BLOCK node.  */
+/* In a BLOCK (scope) node:
+   Variables declared in the scope NODE.  */
 #define BLOCK_VARS(NODE) (BLOCK_CHECK (NODE)->block.vars)
 #define BLOCK_NONLOCALIZED_VARS(NODE) \
   (BLOCK_CHECK (NODE)->block.nonlocalized_vars)
 #define BLOCK_NUM_NONLOCALIZED_VARS(NODE) \
   vec_safe_length (BLOCK_NONLOCALIZED_VARS (NODE))
 #define BLOCK_NONLOCALIZED_VAR(NODE,N) (*BLOCK_NONLOCALIZED_VARS (NODE))[N]
+/* A chain of BLOCKs (scopes) nested within the scope NODE.  */
 #define BLOCK_SUBBLOCKS(NODE) (BLOCK_CHECK (NODE)->block.subblocks)
+/* The scope enclosing the scope NODE, or FUNCTION_DECL for the "outermost"
+   function scope.  Inlined functions are chained by this so that given
+   expression E and its TREE_BLOCK(E) B, BLOCK_SUPERCONTEXT(B) is the scope
+   in which E has been made or into which E has been inlined.   */
 #define BLOCK_SUPERCONTEXT(NODE) (BLOCK_CHECK (NODE)->block.supercontext)
+/* Points to the next scope at the same level of nesting as scope NODE.  */
 #define BLOCK_CHAIN(NODE) (BLOCK_CHECK (NODE)->block.chain)
+/* A BLOCK, or FUNCTION_DECL of the function from which a block has been
+   inlined.  In a scope immediately enclosing an inlined leaf expression,
+   points to the outermost scope into which it has been inlined (thus
+   bypassing all intermediate BLOCK_SUPERCONTEXTs). */
 #define BLOCK_ABSTRACT_ORIGIN(NODE) (BLOCK_CHECK (NODE)->block.abstract_origin)
-#define BLOCK_ORIGIN(NODE) \
+#define BLOCK_ORIGIN(NODE)		\
   (BLOCK_ABSTRACT_ORIGIN(NODE) ? BLOCK_ABSTRACT_ORIGIN(NODE) : (NODE))
 #define BLOCK_DIE(NODE) (BLOCK_CHECK (NODE)->block.die)
 
@@ -5078,7 +5089,10 @@ function_args_iter_next (function_args_iterator *i)
   i->next = TREE_CHAIN (i->next);
 }
 
-/* We set BLOCK_SOURCE_LOCATION only to inlined function entry points.  */
+/* Returns true if a BLOCK has a source location.
+   BLOCK_SOURCE_LOCATION is set only to inlined function entry points,
+   so the function returns true for all but the innermost and outermost
+   blocks into which an expression has been inlined.  */
 
 static inline bool
 inlined_function_outer_scope_p (const_tree block)

Re: [PATCH 1/4] unroll: Add middle-end unroll factor estimation

2021-01-21 Thread Segher Boessenkool

Hi!

What is holding up this patch still?  Ke Wen has pinged it every month
since May, and there has still not been a review.


Segher


On Thu, May 28, 2020 at 08:19:59PM +0800, Kewen.Lin wrote:
> 
> gcc/ChangeLog
> 
> 2020-MM-DD  Kewen Lin  
> 
>   * cfgloop.h (struct loop): New field estimated_unroll.
>   * tree-ssa-loop-manip.c (decide_unroll_const_iter): New function.
>   (decide_unroll_runtime_iter): Likewise.
>   (decide_unroll_stupid): Likewise.
>   (estimate_unroll_factor): Likewise.
>   * tree-ssa-loop-manip.h (estimate_unroll_factor): New declaration.
>   * tree-ssa-loop.c (tree_average_num_loop_insns): New function.
>   * tree-ssa-loop.h (tree_average_num_loop_insns): New declaration.
> 
> 

> ---
>  gcc/cfgloop.h |   3 +
>  gcc/tree-ssa-loop-manip.c | 253 
> ++
>  gcc/tree-ssa-loop-manip.h |   3 +-
>  gcc/tree-ssa-loop.c   |  33 ++
>  gcc/tree-ssa-loop.h   |   2 +
>  5 files changed, 292 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index 11378ca..c5bcca7 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -232,6 +232,9 @@ public:
>   Other values means unroll with the given unrolling factor.  */
>unsigned short unroll;
>  
> +  /* Like unroll field above, but it's estimated in middle-end.  */
> +  unsigned short estimated_unroll;
> +
>/* If this loop was inlined the main clique of the callee which does
>   not need remapping when copying the loop body.  */
>unsigned short owned_clique;
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index 120b35b..8a5a1a9 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "system.h"
>  #include "coretypes.h"
>  #include "backend.h"
> +#include "target.h"
>  #include "tree.h"
>  #include "gimple.h"
>  #include "cfghooks.h"
> @@ -42,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgloop.h"
>  #include "tree-scalar-evolution.h"
>  #include "tree-inline.h"
> +#include "wide-int.h"
>  
>  /* All bitmaps for rewriting into loop-closed SSA go on this obstack,
> so that we can free them all at once.  */
> @@ -1592,3 +1594,254 @@ canonicalize_loop_ivs (class loop *loop, tree *nit, 
> bool bump_in_latch)
>  
>return var_before;
>  }
> +
> +/* Try to determine estimated unroll factor for given LOOP with constant 
> number
> +   of iterations, mainly refer to decide_unroll_constant_iterations.
> +- NITER_DESC holds number of iteration description if it isn't NULL.
> +- NUNROLL holds a unroll factor value computed with instruction numbers.
> +- ITER holds estimated or likely max loop iterations.
> +   Return true if it succeeds, also update estimated_unroll.  */
> +
> +static bool
> +decide_unroll_const_iter (class loop *loop, const tree_niter_desc 
> *niter_desc,
> +   unsigned nunroll, const widest_int *iter)
> +{
> +  /* Skip big loops.  */
> +  if (nunroll <= 1)
> +return false;
> +
> +  gcc_assert (niter_desc && niter_desc->assumptions);
> +
> +  /* Check number of iterations is constant, return false if no.  */
> +  if ((niter_desc->may_be_zero && !integer_zerop (niter_desc->may_be_zero))
> +  || !tree_fits_uhwi_p (niter_desc->niter))
> +return false;
> +
> +  unsigned HOST_WIDE_INT const_niter = tree_to_uhwi (niter_desc->niter);
> +
> +  /* If unroll factor is set explicitly, use it as estimated_unroll.  */
> +  if (loop->unroll > 0 && loop->unroll < USHRT_MAX)
> +{
> +  /* It should have been peeled instead.  */
> +  if (const_niter == 0 || (unsigned) loop->unroll > const_niter - 1)
> + loop->estimated_unroll = 1;
> +  else
> + loop->estimated_unroll = loop->unroll;
> +  return true;
> +}
> +
> +  /* Check whether the loop rolls enough to consider.  */
> +  if (const_niter < 2 * nunroll || wi::ltu_p (*iter, 2 * nunroll))
> +return false;
> +
> +  /* Success; now compute number of iterations to unroll.  */
> +  unsigned best_unroll = 0, n_copies = 0;
> +  unsigned best_copies = 2 * nunroll + 10;
> +  unsigned i = 2 * nunroll + 2;
> +
> +  if (i > const_niter - 2)
> +i = const_niter - 2;
> +
> +  for (; i >= nunroll - 1; i--)
> +{
> +  unsigned exit_mod = const_niter % (i + 1);
> +
> +  if (!empty_block_p (loop->latch))
> + n_copies = exit_mod + i + 1;
> +  else if (exit_mod != i)
> + n_copies = exit_mod + i + 2;
> +  else
> + n_copies = i + 1;
> +
> +  if (n_copies < best_copies)
> + {
> +   best_copies = n_copies;
> +   best_unroll = i;
> + }
> +}
> +
> +  loop->estimated_unroll = best_unroll + 1;
> +  return true;
> +}
> +
> +/* Try to determine estimated unroll factor for given LOOP with countable but
> +   non-constant number of iterations, mainly refer to
> +   decide_unroll_runtime_iterations.
> +- N

Re: [BACKPORT] Apply fix for PR libgcc/97643 to gcc 10 branch

2021-01-21 Thread Segher Boessenkool

On Wed, Jan 20, 2021 at 08:28:57PM -0500, Michael Meissner wrote:
> On Wed, Jan 20, 2021 at 06:46:14PM -0600, Segher Boessenkool wrote:
> > Is there a reason we do not have that testcase in the testsuite, btw?
> 
> In order to test it you need to build a compiler + toolchain where the default
> long double is 64-bits.  So it is kind of hard to put in a test for it, since
> it was a bug in how libgcc was built with a compiler that defaults to 64-bit
> long double.

Ah, libgcc, that's what I was missing.  Thanks!

(Btw, in that testcase, it should use %zd for printing sizeof, not %ld.
Well maybe that is the same for all configs this test is run on, but
heh.)


Segher

skip asan-poisoning of discarded vars

2021-01-21 Thread Alexandre Oliva



GNAT may create temporaries to hold return values of function calls.
If such a temporary is created as part of a dynamic initializer of a
variable in a unit other than the one being compiled, the initializer
is dropped, including the temporary and its binding block.

Don't issue asan mark calls for such variables, they are gone.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* gimplify.c (gimplify_decl_expr): Skip asan marking calls for
temporaries not seen in binding block, and not about to be
added as gimple variables.

for  gcc/testsuite/ChangeLog

* gnat.dg/asan1.adb: New test.
* gnat.dg/asan1_pkg.ads: New additional source.
---
 gcc/gimplify.c  |8 +++-
 gcc/testsuite/gnat.dg/asan1.adb |   15 +++
 gcc/testsuite/gnat.dg/asan1_pkg.ads |9 +
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gnat.dg/asan1.adb
 create mode 100644 gcc/testsuite/gnat.dg/asan1_pkg.ads

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d2ac5f913593f..95d55bb8ba4c7 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1795,7 +1795,13 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
  && !DECL_HAS_VALUE_EXPR_P (decl)
  && DECL_ALIGN (decl) <= MAX_SUPPORTED_STACK_ALIGNMENT
  && dbg_cnt (asan_use_after_scope)
- && !gimplify_omp_ctxp)
+ && !gimplify_omp_ctxp
+ /* GNAT introduces temporaries to hold return values of calls in
+initializers of variables defined in other units, so the
+declaration of the variable is discarded completely.  We do not
+want to issue poison calls for such dropped variables.  */
+ && (DECL_SEEN_IN_BIND_EXPR_P (decl)
+ || (DECL_ARTIFICIAL (decl) && DECL_NAME (decl) == NULL_TREE)))
{
  asan_poisoned_variables->add (decl);
  asan_poison_variable (decl, false, seq_p);
diff --git a/gcc/testsuite/gnat.dg/asan1.adb b/gcc/testsuite/gnat.dg/asan1.adb
new file mode 100644
index 0..a4bc59a9a2143
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/asan1.adb
@@ -0,0 +1,15 @@
+--  { dg-do compile }
+--  { dg-additional-sources asan1_pkg.ads }
+--  { dg-options "-fsanitize=address" }
+--  { dg-skip-if "" no_fsanitize_address }
+
+with Asan1_Pkg;
+
+procedure Asan1 is
+   use Asan1_Pkg;
+
+   X, Y : E;
+begin
+   X := C (N);
+   Y := V;
+end Asan1;
diff --git a/gcc/testsuite/gnat.dg/asan1_pkg.ads 
b/gcc/testsuite/gnat.dg/asan1_pkg.ads
new file mode 100644
index 0..fbbc1c5e7f5bd
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/asan1_pkg.ads
@@ -0,0 +1,9 @@
+package Asan1_Pkg is
+   subtype E is Integer;
+   type T is array (1..32) of E;
+
+   function N return T;
+   function C (P : T) return E;
+
+   V : constant E := C (N);
+end Asan1_Pkg;


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: skip asan-poisoning of discarded vars

2021-01-21 Thread Jakub Jelinek via Gcc-patches

On Thu, Jan 21, 2021 at 06:30:06PM -0300, Alexandre Oliva wrote:
> 
> GNAT may create temporaries to hold return values of function calls.
> If such a temporary is created as part of a dynamic initializer of a
> variable in a unit other than the one being compiled, the initializer
> is dropped, including the temporary and its binding block.
> 
> Don't issue asan mark calls for such variables, they are gone.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to install?
> 
> 
> for  gcc/ChangeLog
> 
>   * gimplify.c (gimplify_decl_expr): Skip asan marking calls for
>   temporaries not seen in binding block, and not about to be
>   added as gimple variables.
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gnat.dg/asan1.adb: New test.
>   * gnat.dg/asan1_pkg.ads: New additional source.

Does that affect only Ada and not other languages?
Could you e.g. try --with-build-config bootstrap-asan and log which
TUs/functions it didn't posion because of this patch?
I usually use hacks like:
{
FILE *f = fopen ("/tmp/whatever", "a");
fprintf (f, "%d %s %s\n", (int) BITS_PER_WORD, main_input_filename ? 
main_input_filename : "-", current_function_name ());
fclose (f);
}

Jakub

follow SSA defs for asan base

2021-01-21 Thread Alexandre Oliva



Ada makes extensive use of nested functions, which turn all automatic
variables of the enclosing function that are used in nested ones into
members of an artificial FRAME record type.

The address of a local variable is usually passed to asan marking
functions without using a temporary.  Taking the address of a member
of FRAME within a nested function, however, is not regarded as a
gimple val: while introducing FRAME variables, current_function_decl
is always the outermost function, even while processing a nested
function, so decl_address_invariant_p returns false for such
ADDR_EXPRs.  So, as automatic variables are moved into FRAME, any asan
call that marks such a variable has its ADDR_EXPR replaced with a
SSA_NAME set to the ADDR_EXPR of the FRAME member.

asan_expand_mark_ifn was not prepared to deal with ADDR_EXPRs split
out into SSA_NAMEs.  This patch deals with such cases.

[It does NOT deal with PHI nodes and whatnot.  I'm not even sure it
should.  Maybe we want the ADDR_EXPR to be a gimple val instead, but
this more conservative fix felt more appropriate for this stage.]

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* asan.c (asan_expand_mark_ifn): Follow SSA_NAME defs for
an ADDR_EXPR base.

for  gcc/testsuite/ChangeLog

* gcc.dg/asan/nested-1.c: New.
---
 gcc/asan.c   |   21 +
 gcc/testsuite/gcc.dg/asan/nested-1.c |   24 
 2 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 89ecd99b18294..2d2fb97098b2f 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -3629,6 +3629,27 @@ asan_expand_mark_ifn (gimple_stmt_iterator *iter)
   bool is_poison = ((asan_mark_flags)flag) == ASAN_MARK_POISON;
 
   tree base = gimple_call_arg (g, 1);
+  while (TREE_CODE (base) == SSA_NAME)
+{
+  gimple *def = SSA_NAME_DEF_STMT (base);
+  if (!def)
+   break;
+
+  if (!is_gimple_assign (def))
+   break;
+
+  if (!SINGLE_SSA_TREE_OPERAND (def, SSA_OP_DEF))
+   break;
+
+  if (gimple_num_ops (def) != 2)
+   break;
+
+  if (gimple_expr_code (def) == ADDR_EXPR
+ || gimple_expr_code (def) == SSA_NAME)
+   base = gimple_assign_rhs1 (def);
+  else
+   break;
+}
   gcc_checking_assert (TREE_CODE (base) == ADDR_EXPR);
   tree decl = TREE_OPERAND (base, 0);
 
diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c 
b/gcc/testsuite/gcc.dg/asan/nested-1.c
new file mode 100644
index 0..87e842098077c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/nested-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=address" } */
+
+int f(int i) {
+  auto int h() {
+int r;
+int *p;
+
+{
+  int x[3];
+
+  auto int g() {
+   return x[i];
+  }
+
+  p = &r;
+  *p = g();
+}
+
+return *p;
+  }
+
+  return h();
+}


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [PATCH] c++: Suppress this injection for static member functions [PR97399]

2021-01-21 Thread Marek Polacek via Gcc-patches

On Thu, Jan 21, 2021 at 11:22:24AM -0500, Patrick Palka via Gcc-patches wrote:
> Here at parse time finish_qualified_id_expr adds an implicit 'this->' to
> the expression tmp::integral (because it's type-dependent, and also
> current_class_ptr is set) within the trailing return type, and later
> during substitution we can't resolve the 'this' since
> tsubst_function_type does inject_this_parm only for non-static member
> functions which tmp::func is not.
> 
> It seems the root of the problem is that we set current_class_ptr when
> parsing the signature of a static member function.  Since explicit uses
> of 'this' are already not allowed in this context, we probably shouldn't
> be doing inject_this_parm either.
> 
> Bootstrapped and regtested on x64_64-pc-linux-gnu, does this look OK for
> trunk?

This looks fine to me.

> gcc/cp/ChangeLog:
> 
>   PR c++/97399
>   * parser.c (cp_parser_init_declarator): If the storage class
>   specifier is sc_static, pass true for static_p to
>   cp_parser_declarator.
>   (cp_parser_direct_declarator): Don't do inject_this_parm when
>   the member function is static.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/88548
>   PR c++/97399
>   * g++.dg/cpp0x/this2.C: New test.
>   * g++.dg/template/pr97399a.C: New test.
>   * g++.dg/template/pr97399b.C: New test.
> ---
>  gcc/cp/parser.c  |  5 +++--
>  gcc/testsuite/g++.dg/cpp0x/this2.C   |  8 
>  gcc/testsuite/g++.dg/template/pr97399a.C | 11 +++
>  gcc/testsuite/g++.dg/template/pr97399b.C | 11 +++
>  4 files changed, 33 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/this2.C
>  create mode 100644 gcc/testsuite/g++.dg/template/pr97399a.C
>  create mode 100644 gcc/testsuite/g++.dg/template/pr97399b.C
> 
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 48437f23175..18cf9888632 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -21413,6 +21413,7 @@ cp_parser_init_declarator (cp_parser* parser,
>bool is_non_constant_init;
>int ctor_dtor_or_conv_p;
>bool friend_p = cp_parser_friend_p (decl_specifiers);
> +  bool static_p = decl_specifiers->storage_class == sc_static;
>tree pushed_scope = NULL_TREE;
>bool range_for_decl_p = false;
>bool saved_default_arg_ok_p = parser->default_arg_ok_p;
> @@ -21446,7 +21447,7 @@ cp_parser_init_declarator (cp_parser* parser,
>  = cp_parser_declarator (parser, CP_PARSER_DECLARATOR_NAMED,
>   flags, &ctor_dtor_or_conv_p,
>   /*parenthesized_p=*/NULL,
> - member_p, friend_p, /*static_p=*/false);
> + member_p, friend_p, static_p);
>/* Gather up the deferred checks.  */
>stop_deferring_access_checks ();
>  
> @@ -22122,7 +22123,7 @@ cp_parser_direct_declarator (cp_parser* parser,
>  
> tree save_ccp = current_class_ptr;
> tree save_ccr = current_class_ref;
> -   if (memfn)
> +   if (memfn && !static_p)
>   /* DR 1207: 'this' is in scope after the cv-quals.  */
>   inject_this_parameter (current_class_type, cv_quals);
>  
> diff --git a/gcc/testsuite/g++.dg/cpp0x/this2.C 
> b/gcc/testsuite/g++.dg/cpp0x/this2.C
> new file mode 100644
> index 000..3781bc5efec
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/this2.C
> @@ -0,0 +1,8 @@
> +// PR c++/88548
> +// { dg-do compile { target c++11 } }
> +
> +struct S {
> +  int a;
> +  template  static auto m1 ()
> +-> decltype(this->a) { return 0; }; // { dg-error "'this'" }
> +};
> diff --git a/gcc/testsuite/g++.dg/template/pr97399a.C 
> b/gcc/testsuite/g++.dg/template/pr97399a.C
> new file mode 100644
> index 000..3713dbde6e0
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/pr97399a.C
> @@ -0,0 +1,11 @@
> +// PR c++/97399
> +// { dg-do compile { target c++11 } }
> +
> +template  struct enable_if_t {};
> +struct tmp {
> +  template  static constexpr bool is_integral();
> +  template  static auto func()
> +-> enable_if_t()>;
> +};
> +template  constexpr bool tmp::is_integral() { return true; }
> +int main() { tmp::func(); }
> diff --git a/gcc/testsuite/g++.dg/template/pr97399b.C 
> b/gcc/testsuite/g++.dg/template/pr97399b.C
> new file mode 100644
> index 000..9196c985834
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/pr97399b.C
> @@ -0,0 +1,11 @@
> +// PR c++/97399
> +// { dg-do compile { target c++11 } }
> +
> +template  struct enable_if_t {};
> +struct tmp {
> +  template  constexpr bool is_integral(); // non-static
> +  template  static auto func()
> +-> enable_if_t()>; // { dg-error "without object" }
> +};
> +template  constexpr bool tmp::is_integral() { return true; }
> +int main() { tmp::func(); } // { dg-error "no match" }
> -- 
> 2.30.0.155.g66e871b664
> 

Marek

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-21 Thread Daniel Engel

Hi Christophe,

On Thu, Jan 21, 2021, at 2:29 AM, Christophe Lyon wrote:
> On Sat, 16 Jan 2021 at 17:13, Daniel Engel  wrote:
> >
> > Hi Christophe,
> >
> > On Fri, Jan 15, 2021, at 4:30 AM, Christophe Lyon wrote:
> > > On Fri, 15 Jan 2021 at 12:39, Daniel Engel  wrote:
> > > >
> > > > Hi Christophe,
> > > >
> > > > On Mon, Jan 11, 2021, at 8:39 AM, Christophe Lyon wrote:
> > > > > On Mon, 11 Jan 2021 at 17:18, Daniel Engel  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jan 11, 2021, at 8:07 AM, Christophe Lyon wrote:
> > > > > > > On Sat, 9 Jan 2021 at 14:09, Christophe Lyon 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sat, 9 Jan 2021 at 13:27, Daniel Engel 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote:
> > > > > > > > > > On 07/01/2021 00:59, Daniel Engel wrote:
> > > > > > > > > > > --snip--
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > > > > > > > > > > --snip--
> > > > > > > > > > >
> > > > > > > > > > >> - finally, your popcount implementations have data in 
> > > > > > > > > > >> the code segment.
> > > > > > > > > > >>  That's going to cause problems when we have compilation 
> > > > > > > > > > >> options such as
> > > > > > > > > > >> -mpure-code.
> > > > > > > > > > >
> > > > > > > > > > > I am just following the precedent of existing lib1funcs 
> > > > > > > > > > > (e.g. __clz2si).
> > > > > > > > > > > If this matters, you'll need to point in the right 
> > > > > > > > > > > direction for the
> > > > > > > > > > > fix.  I'm not sure it does matter, since these functions 
> > > > > > > > > > > are PIC anyway.
> > > > > > > > > >
> > > > > > > > > > That might be a bug in the clz implementations - 
> > > > > > > > > > Christophe: Any thoughts?
> > > > > > > > >
> > > > > > > > > __clzsi2() has test coverage in 
> > > > > > > > > "gcc.c-torture/execute/builtin-bitops-1.c"
> > > > > > > > Thanks, I'll have a closer look at why I didn't see problems.
> > > > > > > >
> > > > > > >
> > > > > > > So, that's because the code goes to the .text section (as opposed 
> > > > > > > to
> > > > > > > .text.noread)
> > > > > > > and does not have the PURECODE flag. The compiler takes care of 
> > > > > > > this
> > > > > > > when generating code with -mpure-code.
> > > > > > > And the simulator does not complain because it only checks loads 
> > > > > > > from
> > > > > > > the segment with the PURECODE flag set.
> > > > > > >
> > > > > > This is far out of my depth, but can something like:
> > > > > >
> > > > > > ifeq (,$(findstring __symbian__,$(shell $(gcc_compile_bare) -dM -E 
> > > > > > -  > > > > >
> > > > > > be adapted to:
> > > > > >
> > > > > > a) detect the state of the -mpure-code switch, and
> > > > > > b) pass that flag to the preprocessor?
> > > > > >
> > > > > > If so, I can probably fix both the target section and the data 
> > > > > > usage.
> > > > > > Just have to add a few instructions to finish unrolling the loop.
> > > > >
> > > > > I must confess I never checked libgcc's Makefile deeply before,
> > > > > but it looks like you can probably detect whether -mpure-code is
> > > > > part of $CFLAGS.
> > > > >
> > > > > However, it might be better to write pure-code-safe code
> > > > > unconditionally because the toolchain will probably not
> > > > > be rebuilt with -mpure-code as discussed before.
> > > > > Or that could mean adding a -mpure-code multilib
> > > >
> > > > I have learned a few things since the last update.  I think I know how
> > > > to get -mpure-code out of CFLAGS and into a macro.  However, I have hit
> > > > something of a wall with testing.  I can't seem to compile any flavor of
> > > > libgcc with CFLAGS_FOR_TARGET="-mpure-code".
> > > >
> > > > 1.  Configuring --with-multilib-list=rmprofile results in build failure:
> > > >
> > > > checking for suffix of object files... configure: error: in 
> > > > `/home/mirdan/gcc-obj/arm-none-eabi/libgcc':
> > > > configure: error: cannot compute suffix of object files: cannot 
> > > > compile
> > > > See `config.log' for more details
> > > >
> > > >cc1: error: -mpure-code only supports non-pic code on M-profile 
> > > > targets
> > > >
> > >
> > > Yes, I did hit that wall too :-)
> > >
> > > Hence what we discussed earlier: the toolchain is not rebuilt with 
> > > -mpure-code.
> > >
> > > Note that there are problems in newlib too, but users of -mpure-code seem
> > > to be able to work around that (eg. using their own startup code and no 
> > > stdlib)
> >
> > Is there a current side project to solve the makefile problems?
> None that I know of.
> 
> 
> > I think I'm back to my original question: If libgcc can't be built
> > with -mpure-code, and users bypass it completely with -nostdlib, then
> > why this conversation about pure-code compatibility of __clzsi2() etc?
> I think Richard noticed this pre-existing problem as part of the review
> of your patches. I don't

Re: [PATCH] improve warning suppression for inlined functions (PR 98465, 98512)

2021-01-21 Thread Martin Sebor via Gcc-patches


On 1/21/21 12:01 PM, Florian Weimer wrote:

* Martin Sebor:


On 1/21/21 10:34 AM, Florian Weimer wrote:

* Martin Sebor via Gcc-patches:


This patch depends on the fix for PR 98664 (already approved but
not yet checked in).  I've tested it on x86_64-linux.

To avoid fallout I tried to keep the changes to a minimum, and
so the design isn't as robust as I'd like it ultimately to be.
I plan to enhance it in stage 1.

I've tested this patch on top of 43705f3fa343e08b2fb030460f (so with
the
PR98664 fix, I think), and the reproducer from PR98512 now ICEs:


Thanks for giving it a try!  I saw a similar ICE during my testing
-- it's caused by a couple of uninitialized variables.  I fixed
it in my tree (see below) but the fix didn't make it into the patch.

Please give this a try and let me know if it doesn't help:

index abcd991b829..d82a7eb67e5 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -1426,7 +1426,7 @@ diagnostic_impl (rich_location *richloc, const
diagnostic_metadata *metadata,
  int opt, const char *gmsgid,
  va_list *ap, diagnostic_t kind)
  {
-  diagnostic_info diagnostic;
+  diagnostic_info diagnostic{ };
if (kind == DK_PERMERROR)
  {
diagnostic_set_info (&diagnostic, gmsgid, ap, richloc,
@@ -1452,7 +1452,7 @@ diagnostic_n_impl (rich_location *richloc, const
diagnostic_metadata *metadata,
const char *plural_gmsgid,
va_list *ap, diagnostic_t kind)
  {
-  diagnostic_info diagnostic;
+  diagnostic_info diagnostic{ };
unsigned long gtn;

if (sizeof n <= sizeof gtn)


This fixes the crash for me, and the warnings is gone as well.


Great, thanks for the confirmation!  I'll post the updated patch
shortly.

Martin



Thanks,
Florian

Re: driver: do not check input file existence here [PR 98452]

2021-01-21 Thread Joseph Myers

On Thu, 21 Jan 2021, Nathan Sidwell wrote:

> Do you want expandargv altered alongs the lines you mention?  Or a bug filed?
> [in order for my patch to be acceptable]

The patch is OK as-is.  Filing a bug for expandargv handling of missing 
files might be a good idea; it seems very arbitrary that @directory 
produces an error, but @file for a nonexistent file, or a file without 
read access, or a file with an I/O error on reading, gets passed through.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: driver: do not check input file existence here [PR 98452]

2021-01-21 Thread Nathan Sidwell


On 1/21/21 12:59 PM, Joseph Myers wrote:

On Wed, 20 Jan 2021, Nathan Sidwell wrote:




Unspecified non-existent response file:
OLD:
(1)devvm293:282>g++ -c  @nothing
g++: error: nothing: No such file or directory
g++: fatal error: no input files
compilation terminated.

NEW:
(1)devvm293:284>./xg++ -B./ -c @nothing
xg++: warning: @nothing: linker input file unused because linking not done


This is less clear, in that one might suppose "nothing" is meant to be a
file listing C++ sources to compile, so should be processed and should
result in an error for its absence.  However, this particular place in the
driver handling OPT_SPECIAL_input_file doesn't seem like the right place
for such an error.  Either the correct handling of response files is that
the @file argument is silently kept as-is and so the warning after your
patch is correct, or the correct handling of response files is that there
should be an error if such a file cannot be found (and so users with
filenames starting @ must reference them e.g. as ./@file.cc) and
expandargv should have some way to report that error.  Leaving it to
OPT_SPECIAL_input_file as in the driver at present would mean such an
error doesn't occur when @file, unexpanded, appears to be an option
argument rather than an input file.



Agreed.

I wonder who cares about naming sources '@file.cc' or '@linkerscript' or 
whatever?


Do you want expandargv altered alongs the lines you mention?  Or a bug 
filed? [in order for my patch to be acceptable]


nathan

--
Nathan Sidwell

Re: [PATCH v2] c++: ICE when mangling operator name [PR98545]

2021-01-21 Thread Marek Polacek via Gcc-patches

On Tue, Jan 19, 2021 at 05:38:20PM -0500, Marek Polacek via Gcc-patches wrote:
> On Tue, Jan 19, 2021 at 03:47:47PM -0500, Jason Merrill via Gcc-patches wrote:
> > On 1/13/21 6:39 PM, Marek Polacek wrote:
> > > r11-6301 added some asserts in mangle.c, and now we trip over one of
> > > them.  In particular, it's the one asserting that we didn't get
> > > IDENTIFIER_ANY_OP_P when mangling an expression with a dependent name.
> > > 
> > > As this testcase shows, it's possible to get that, so turn the assert
> > > into an if and write "on".  That changes the mangling in the following
> > > way:
> > > 
> > > With this patch:
> > > 
> > > $ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTonclspcvT__EEEDpS2_
> > > decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h > > double, a>(a, double, a)
> > > 
> > > G++10:
> > > $ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTclspcvT__EEEDpS2_
> > > decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h > > double, a>(a, double, a)
> > > 
> > > clang++/icc:
> > > $ c++filt _ZN1i1hIJ1adS1_EEEDTclonclspcvT__EEEDpS2_
> > > decltype ((operator())((a)(), (double)(), (a)())) i::h(a, 
> > > double, a)
> > > 
> > > I'm not sure why we differ in the "(*this)." part
> > 
> > Is there a PR for that?
> 
> I just opened 98756, because I didn't find any.  I can investigate where that
> (*this) comes from, though it's not readily clear to me if this is a bug or 
> not.
> 
> > > but at least the
> > > suffix "onclspcvT__EEEDpS2_" is the same for all three compilers.  So
> > > I hope the following fix makes sense.
> > > 
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   PR c++/98545
> > >   * mangle.c (write_expression): When the expression is a dependent name
> > >   and an operator name, write "on" before writing its name.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR c++/98545
> > >   * g++.dg/abi/mangle76.C: New test.
> > > ---
> > >   gcc/cp/mangle.c |  3 ++-
> > >   gcc/testsuite/g++.dg/abi/mangle76.C | 39 +
> > >   2 files changed, 41 insertions(+), 1 deletion(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/abi/mangle76.C
> > > 
> > > diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
> > > index 11eb8962d28..bb3c4b76d33 100644
> > > --- a/gcc/cp/mangle.c
> > > +++ b/gcc/cp/mangle.c
> > > @@ -3349,7 +3349,8 @@ write_expression (tree expr)
> > > else if (dependent_name (expr))
> > >   {
> > > tree name = dependent_name (expr);
> > > -  gcc_assert (!IDENTIFIER_ANY_OP_P (name));
> > > +  if (IDENTIFIER_ANY_OP_P (name))
> > > + write_string ("on");
> > 
> > Any mangling change needs to handle different -fabi-versions; see the
> > similar code in write_member_name.
> 
> Ah, I only looked at the unguarded IDENTIFIER_ANY_OP_P checks.  But now
> I have a possibly stupid question: what version should I check?  We have
> Version 11 for which the manual already says "corrects the mangling of
> sizeof... expressions and *operator names*", so perhaps I could tag along
> and check abi_version_at_least (11).  Or should I check Version 15 and
> update the manual?

The latter seems to be true, therefore a new patch is attached.
  
> > And why doesn't this go through write_member_name?
> 
> We go through write_member_name:
> 
> #0  fancy_abort (file=0x2b98ef8 "/home/mpolacek/src/gcc/gcc/cp/mangle.c", 
> line=3352, 
> function=0x2b99751 "write_expression") at 
> /home/mpolacek/src/gcc/gcc/diagnostic.c:1884
> #1  0x00bee91b in write_expression (expr=)
> at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3352
> #2  0x00beb3e2 in write_member_name (member=)
> at /home/mpolacek/src/gcc/gcc/cp/mangle.c:2892
> #3  0x00beee70 in write_expression (expr= 0x7fffea043b40>)
> at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3405
> #4  0x00bef1be in write_expression (expr=)
> at /home/mpolacek/src/gcc/gcc/cp/mangle.c:3455
> #5  0x00be858a in write_type (type=)
> at /home/mpolacek/src/gcc/gcc/cp/mangle.c:2343
> 
> so in write_member_name MEMBER is a BASELINK so we don't enter the
> identifier_p block.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
r11-6301 added some asserts in mangle.c, and now we trip over one of
them.  In particular, it's the one asserting that we didn't get
IDENTIFIER_ANY_OP_P when mangling an expression with a dependent name.

As this testcase shows, it's possible to get that, so turn the assert
into an if and write "on".  That changes the mangling in the following
way:

With this patch:

$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTonclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

G++10:
$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

clang++/icc:
$ c++filt _ZN1i1hIJ1adS1_EEEDTclonclspcvT__EEEDpS2_
decltype ((operator())((a)(), (double)(), (a)())

c++: Fix null this pointer [PR 98624]

2021-01-21 Thread Nathan Sidwell


One maynot usea null this pointer to invoke astatic member
function.  Thisfixes the remaining ubsan errors found with an
ubsan bootstrap.

PR c++/98624
gcc/cp/
* module.cc (depset::hash::find_dependencies): Add
module arg.
(trees_out::core_vals):Check state before calling
write_location.
(sort_cluster, module_state::write): Adjust
find_dependencies call.

--
Nathan Sidwell
diff --git i/gcc/cp/module.cc w/gcc/cp/module.cc
index 8f9c7940ef8..6741ae03ee7 100644
--- i/gcc/cp/module.cc
+++ w/gcc/cp/module.cc
@@ -2567,7 +2567,7 @@ public:
 void add_class_entities (vec *);
 
   public:
-void find_dependencies ();
+void find_dependencies (module_state *);
 bool finalize_dependencies ();
 vec connect ();
   };
@@ -5898,7 +5898,8 @@ trees_out::core_vals (tree t)
   if (!DECL_TEMPLATE_PARM_P (t))
 	WT (t->decl_minimal.context);
 
-  state->write_location (*this, t->decl_minimal.locus);
+  if (state)
+	state->write_location (*this, t->decl_minimal.locus);
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_TYPE_COMMON))
@@ -6001,7 +6002,8 @@ trees_out::core_vals (tree t)
 
   if (CODE_CONTAINS_STRUCT (code, TS_EXP))
 {
-  state->write_location (*this, t->exp.locus);
+  if (state)
+	state->write_location (*this, t->exp.locus);
 
   /* Walk in forward order, as (for instance) REQUIRES_EXPR has a
  bunch of unscoped parms on its first operand.  It's safer to
@@ -6140,9 +6142,12 @@ trees_out::core_vals (tree t)
 
   /* Miscellaneous common nodes.  */
 case BLOCK:
-  state->write_location (*this, t->block.locus);
-  state->write_location (*this, t->block.end_locus);
-  
+  if (state)
+	{
+	  state->write_location (*this, t->block.locus);
+	  state->write_location (*this, t->block.end_locus);
+	}
+
   /* DECL_LOCAL_DECL_P decls are first encountered here and
  streamed by value.  */
   chained_decls (t->block.vars);
@@ -6183,7 +6188,8 @@ trees_out::core_vals (tree t)
 	/* The ompcode is serialized in start.  */
 	if (streaming_p ())
 	  WU (t->omp_clause.subcode.map_kind);
-	state->write_location (*this, t->omp_clause.locus);
+	if (state)
+	  state->write_location (*this, t->omp_clause.locus);
 
 	unsigned len = omp_clause_num_ops[OMP_CLAUSE_CODE (t)];
 	for (unsigned ix = 0; ix != len; ix++)
@@ -6270,8 +6276,9 @@ trees_out::core_vals (tree t)
   WT (((lang_tree_node *)t)->lambda_expression.extra_scope);
   /* pending_proxies is a parse-time thing.  */
   gcc_assert (!((lang_tree_node *)t)->lambda_expression.pending_proxies);
-  state->write_location
-	(*this, ((lang_tree_node *)t)->lambda_expression.locus);
+  if (state)
+	state->write_location
+	  (*this, ((lang_tree_node *)t)->lambda_expression.locus);
   if (streaming_p ())
 	{
 	  WU (((lang_tree_node *)t)->lambda_expression.default_capture_mode);
@@ -6291,8 +6298,9 @@ trees_out::core_vals (tree t)
 case STATIC_ASSERT:
   WT (((lang_tree_node *)t)->static_assertion.condition);
   WT (((lang_tree_node *)t)->static_assertion.message);
-  state->write_location
-	(*this, ((lang_tree_node *)t)->static_assertion.location);
+  if (state)
+	state->write_location
+	  (*this, ((lang_tree_node *)t)->static_assertion.location);
   break;
 
 case TEMPLATE_DECL:
@@ -6324,7 +6332,8 @@ trees_out::core_vals (tree t)
 		WT (m.binfo);
 		WT (m.decl);
 		WT (m.diag_decl);
-		state->write_location (*this, m.loc);
+		if (state)
+		  state->write_location (*this, m.loc);
 	  }
 	  }
   }
@@ -13159,9 +13168,9 @@ depset::hash::add_mergeable (depset *mergeable)
entries on the same binding that need walking.  */
 
 void
-depset::hash::find_dependencies ()
+depset::hash::find_dependencies (module_state *module)
 {
-  trees_out walker (NULL, NULL, *this);
+  trees_out walker (NULL, module, *this);
   vec unreached;
   unreached.create (worklist.length ());
 
@@ -13547,7 +13556,7 @@ sort_cluster (depset::hash *original, depset *scc[], unsigned size)
   gcc_checking_assert (use_lwm <= bind_lwm);
   dump (dumper::MERGE) && dump ("Ordering %u/%u depsets", use_lwm, size);
 
-  table.find_dependencies ();
+  table.find_dependencies (nullptr);
 
   vec order = table.connect ();
   gcc_checking_assert (order.length () == use_lwm);
@@ -17571,7 +17580,7 @@ module_state::write (elf_out *to, cpp_reader *reader)
 }
 
   /* Now join everything up.  */
-  table.find_dependencies ();
+  table.find_dependencies (this);
 
   if (!table.finalize_dependencies ())
 {

Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-01-21 Thread David Malcolm via Gcc-patches

On Thu, 2021-01-21 at 20:09 +0100, Jan Hubicka wrote:
> > On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > > >  wrote:
> > > > > gimple.h has this comment for gimple's uid field:
> > > > > 
> > > > >   /* UID of this statement.  This is used by passes that want
> > > > > to
> > > > >  assign IDs to statements.  It must be assigned and used
> > > > > by
> > > > > each
> > > > >  pass.  By default it should be assumed to contain
> > > > > garbage.  */
> > > > >   unsigned uid;
> > > > > 
> > > > > and gimple_set_uid has:
> > > > > 
> > > > >Please note that this UID property is supposed to be
> > > > > undefined
> > > > > at
> > > > >pass boundaries.  This means that a given pass should not
> > > > > assume it
> > > > >contains any useful value when the pass starts and thus
> > > > > can
> > > > > set it
> > > > >to any value it sees fit.
> > > > > 
> > > > > which suggests that any pass can use the uid field as an
> > > > > arbitrary
> > > > > scratch space.
> > > > > 
> > > > > PR analyzer/98599 reports a case where this error occurs in
> > > > > LTO
> > > > > mode:
> > > > >   fatal error: Cgraph edge statement index out of range
> > > > > on certain inputs with -fanalyzer.
> > > > > 
> > > > > The error occurs in the LTRANS phase after -fanalyzer runs in
> > > > > the
> > > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > > stmts.
> > > > > 
> > > > > The error occurs when LTRANS is streaming callgraph edges
> > > > > back
> > > > > in.
> > > > > If I'm reading things correctly, the LTO format uses stmt
> > > > > uids to
> > > > > associate call stmts with callgraph edges between WPA and
> > > > > LTRANS.
> > > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > > gimple_uid, and input_edge reads it back in.
> > > > > 
> > > > > Hence IPA passes that touch the uids in WPA need to restore
> > > > > them,
> > > > > or the stream-in at LTRANS will fail.
> > > > > 
> > > > > Is it intended that the LTO machinery relies on the value of
> > > > > the
> > > > > uid
> > > > > field being preserved during WPA (or, at least, needs to be
> > > > > saved
> > > > > and
> > > > > restored by passes that touch it)?
> > > > 
> > > > I belive this is solely at the cgraph stream out to stream in
> > > > boundary but
> > > > this may be a blurred area since while we materialize the whole
> > > > cgraph
> > > > at once the function bodies are streamed in on demand.
> > > > 
> > > > Honza can probably clarify things.
> > > 
> > > Well, the uids are used to associate cgraph edges with
> > > statements.  At
> > > WPA stage you do not have function bodies and thus uids serves
> > > role
> > > of
> > > pointers to the statement.  If you load the body in (via
> > > get_body)
> > > the
> > > uids are replaced by pointers and when you stream out uids are
> > > recomputed again.
> > > 
> > > When do you touch the uids? At WPA time or from small IPA pass in
> > > ltrans?
> > 
> > The analyzer is here in passes.def:
> >   INSERT_PASSES_AFTER (all_regular_ipa_passes)
> >   NEXT_PASS (pass_analyzer);
> > 
> > and so in LTO runs as the first regular IPA pass at WPA time,
> > when do_whole_program_analysis calls:
> >   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> > 
> > FWIW I hope to eventually have a way to summarize function bodies
> > in
> > the analyzer, but I don't yet, so I'm currently brute forcing
> > things by
> > loading all function bodies at the start of the analyzer (when
> > -fanalyzer is enabled).
> > 
> > I wonder if that's messing things up somehow?
> 
> Actually I think it should work.  If you do get_body or
> get_untransformed_body (that will be equal at that time) you ought to
> get ids in symtab datastructure rewritten to pointers and at stream
> out
> time we should assign new ids...
> > Does the stream-out from WPA make any assumptions about the stmt
> > uids?
> > For example, 
> >   #define STMT_UID_NOT_IN_RANGE(uid) \
> > (gimple_stmt_max_uid (fn) < uid || uid == 0)
> > seems to assume that the UIDs are per-function ranges from
> >   [0-gimple_stmt_max_uid (fn)]
> > which isn't the case for the uids set by the analyzer.  Maybe
> > that's
> > the issue here?
> > 
> > Sorry for not being more familiar with the IPA/LTO code
> 
> There is lto_prepare_function_for_streaming which assigns uids to be
> incremental.   So I guess problem is that it is not called at WPA
> time
> if function is in memory (since at moment we do not really modify
> bodies
> at WPA time, however we do stream them in sometimes to icf compare
> them
> or to update profile).
> 
> So i guess fix would be to arrange lto_prepare_function_for_streaming
> to
> be called on all functions with body defined before WPA stream-out?

Thanks; I'll have a go at implementing that.
Dave

> Honza
> > Dave
> > 
> > 
> > > hozna
> > > > Note LTO uses this exactly because of this comment to

[PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]

2021-01-21 Thread Kwok Cheung Yeung


Hello

This patch addresses the intermittent hanging seen in the 
libgomp.c-c++-common/task-detach-6.f90 test.


The main problem is due to the 'omp taskwait' in the test. GOMP_taskwait can run 
tasks, so for correct semantics it needs to be able to place finished tasks that 
have unfulfilled completion events into the detach queue, rather than just 
finishing them immediately (in effect ignoring the detach clause).


Unfinished tasks in the detach queue are still children of their parent task, so 
they can appear in next_task in the main GOMP_taskwait loop. If next_task is 
fulfilled then it can be finished immediately, otherwise it will wait on 
taskwait_sem.


omp_fulfill_event needs to be able to post the taskwait_sem semaphore as well as 
wake the team barrier. Since the semaphore is located on the parent of the task 
whose completion event is being fulfilled, I have changed the event handle to 
being a pointer to the task instead of just the completion semaphore in order to 
access the parent field.


This type of code is currently used to wake the threads for the team barrier:

  if (team->nthreads > team->task_running_count)
gomp_team_barrier_wake (&team->barrier, 1);

This issues a gomp_team_barrier_wake if any of the threads are not running a 
task (and so might be sleeping). However, detach tasks that are queued waiting 
for a completion event are currently included in task_running_count (because the 
finish_cancelled code executed later decrements it). Since 
gomp_barrier_handle_tasks does not block if there are unfinished detached tasks 
remaining (since during development I found that doing so could cause deadlocks 
in single-threaded code), threads could be sleeping even if team->nthreads == 
team->task_running_count, and this code would fail to wake them. I fixed this by 
decrementing task_running_count when queuing an unfinished detach task, and 
skipping the decrement in finish_cancelled if the task was a queued detach tash. 
I added a new gomp_task_kind GOMP_TASK_DETACHED to mark these type of tasks.


I have tried running the task-detach-6 testcase (C and Fortran) 10,000 
iterations at a time using 32 threads, on a x86_64 Linux machine with GCC built 
with --disable-linux-futex, and no hangs. I have checked that it bootstraps, and 
noticed no regressions in the libgomp testsuite when run without offloading.


With Nvidia and GCN offloading though, task-detach-6 hangs... I _think_ the 
reason why it 'worked' before was because the taskwait allowed tasks with detach 
clauses to always complete immediately after execution. Since that backdoor has 
been closed, task-detach-6 hangs with or without the taskwait.


I think GOMP_taskgroup_end and maybe gomp_task_maybe_wait_for_dependencies also 
need the same type of TLC as they can also run tasks, but there are currently no 
tests that exercise it.


The detach support clearly needs more work, but is this particular patch okay 
for trunk?


Thanks

Kwok
From 12cc24c937e9294d5616dd0cd9a754c02ffb26fa Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Thu, 21 Jan 2021 05:38:47 -0800
Subject: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp
 tests [PR98738]

This adds support for the task detach clause to taskwait, and fixes a
number of problems related to semaphores that may lead to a hang in
some circumstances.

2021-01-21  Kwok Cheung Yeung  

libgomp/

PR libgomp/98738
* libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED.
* task.c (task_fulfilled_p): Check detach field as well.
(GOMP_task): Use address of task as the event handle.
(gomp_barrier_handle_tasks): Fix indentation.  Use address of task
as event handle. Set kind of suspended detach task to
GOMP_TASK_DETACHED and decrement task_running_count.  Move
finish_cancelled block out of else branch.  Skip decrement of
task_running_count if task kind is GOMP_TASK_DETACHED.
(GOMP_taskwait): Finish fulfilled detach tasks.  Update comment.
Queue detach tasks that have not been fulfilled.
(omp_fulfill_event): Use address of task as event handle.  Post
to taskwait_sem and taskgroup_sem if necessary.  Check
task_running_count before calling gomp_team_barrier_wake.
* testsuite/libgomp.c-c++-common/task-detach-5.c (main): Change
data-sharing of detach events on enclosing parallel to private.
* testsuite/libgomp.c-c++-common/task-detach-6.c (main): Likewise.
* testsuite/libgomp.fortran/task-detach-5.f90 (task_detach_5):
Likewise.
* testsuite/libgomp.fortran/task-detach-6.f90 (task_detach_6):
Likewise.
---
 libgomp/libgomp.h  |   5 +-
 libgomp/task.c | 155 ++---
 .../testsuite/libgomp.c-c++-common/task-detach-5.c |   2 +-
 .../testsuite/libgomp.c-c++-common/task-detach-6.c |   2 +-
 .../testsuite/libgomp.fortran/task-

Re: [PATCH] c++: private inheritance access diagnostics fix [PR17314]

2021-01-21 Thread Anthony Sharp via Gcc-patches

Hi Jason,

I've finally completed my copyright assignment form. I've attached it
to this email for reference.

> You don't need write access to the main repository to use these commands
> on your local copy.  One nice thing about git compared to svn is that
> you don't need to touch the server for anything but push and pull.
>
> Incidentally, how are you producing your patch?  Maybe try git
> format-patch instead.

The method I am using at the moment is the one Ranjit Mathew talks
about here: http://rmathew.com/articles/gcj/crpatch.html. Actually,
having just re-read it, it says: 'NOTE: This is not the “proper” or
“official” way of creating and submitting patches - that process has
been explained in detail elsewhere. That process requires one to use
Subversion (SVN). The process described here is meant for “one-off
hackers” or people who cannot use SVN for some reason or the other.'
... oops!

It's my fault kind of - the official GCC webpage
(https://gcc.gnu.org/gitwrite.html) explaining how to do it is called
'Read-write Git access' so I assumed it was only relevant for people
who have access to the repo, but I see that is not the case.

I've tried the git way of doing it and I'm attaching a new patch file
that (hopefully) is better this time. Basically what I did was what
you suggested:

git pull
contrib/gcc-git-customization.sh
(make changes)
git add *
git gcc-commit-mklog
git gcc-commit-mklog --amend
git format-patch -1 master

I also re-built the source just to make sure I hadn't messed anything
up. I re-ran the C++ regression tests using make check-c and make
check-c++. Whilst I did not do a before/after comparison of the
results, I checked the FAILs in gcc.sum and g++.sum and they all
looked like they had nothing to do with my code. All the code is the
same as before, so I'm thinking it should be fine (I just wanted to be
safe). Also checked against check_GNU_style.sh.

Assuming that's all fine, as for the code itself, there might well be
some tweaks that could make it better, and so if that is the case then
please let me know.

Kind regards,
Anthony Sharp


Sharp.1672871.GCC.pdf
Description: Adobe PDF document
From 9f7fa0b4a892f717974d79a6a56a5f8a8a8d9943 Mon Sep 17 00:00:00 2001
From: Anthony Sharp 
Date: Thu, 21 Jan 2021 15:26:25 +
Subject: [PATCH] gcc/cp/ChangeLog:

	2021-1-21  Anthony Sharp  
	Fixes PR17314
	* call.c (complain_about_access): Altered function.
	* cp-tree.h (complain_about_access): Changed parameters of function.
	* search.c (access_in_type): Made function non-static so it can be
	used in semantics.c.
	* semantics.c (access_in_type): Added as extern function.
	(get_parent_with_private_access): Added function.
	(enforce_access): Modified function.
	* typeck.c (complain_about_unrecognized_member): Updated function
	arguments in complain_about_access.

gcc/testsuite/ChangeLog:

	2021-1-21  Anthony Sharp  
	Changes required due to PR17314 fix
	* g++.dg/lookup/scoped1.C: Modified testcase to run successfully with
	changes.
	* g++.dg/tc1/dr142.C: Same as above.
	* g++.dg/tc1/dr52.C: Same as above.
	* g++.old-deja/g++.brendan/visibility6.C: Same as above.
	* g++.old-deja/g++.brendan/visibility8.C: Same as above.
	* g++.old-deja/g++.jason/access8.C: Same as above.
	* g++.old-deja/g++.law/access4.C: Same as above.
	* g++.old-deja/g++.law/visibility12.C: Same as above.
	* g++.old-deja/g++.law/visibility4.C: Same as above.
	* g++.old-deja/g++.law/visibility8.C: Same as above.
	* g++.old-deja/g++.other/access4.C: Same as above.
---
 gcc/cp/call.c | 64 ++---
 gcc/cp/cp-tree.h  |  3 +-
 gcc/cp/search.c   |  4 +-
 gcc/cp/semantics.c| 68 ++-
 gcc/cp/typeck.c   |  3 +-
 gcc/testsuite/g++.dg/lookup/scoped1.C |  4 +-
 gcc/testsuite/g++.dg/tc1/dr142.C  |  8 +--
 gcc/testsuite/g++.dg/tc1/dr52.C   |  6 +-
 .../g++.old-deja/g++.brendan/visibility6.C|  4 +-
 .../g++.old-deja/g++.brendan/visibility8.C|  4 +-
 .../g++.old-deja/g++.jason/access8.C  |  5 +-
 gcc/testsuite/g++.old-deja/g++.law/access4.C  |  5 +-
 .../g++.old-deja/g++.law/visibility12.C   |  7 +-
 .../g++.old-deja/g++.law/visibility4.C|  5 +-
 .../g++.old-deja/g++.law/visibility8.C|  5 +-
 .../g++.old-deja/g++.other/access4.C  |  4 +-
 16 files changed, 145 insertions(+), 54 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index b6e9f125aeb..fc2f1c6226c 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -7142,33 +7142,51 @@ build_op_delete_call (enum tree_code code, tree addr, tree size,
 /* Issue diagnostics about a disallowed access of DECL, using DIAG_DECL
in the diagnostics.
 
-   If ISSUE_ERROR is true, then issue an error about the
-   access, followed by a note showing the declaration.
-   Otherwise, just show the note.  */
+   If ISSUE_ERROR is true, then issue an error about

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka

> On 1/21/21 2:46 PM, Jan Hubicka wrote:
> > I think easy way to get users of this option is to make profile not
> > reproducible by default and modify packages to use right reproducibility
> > option when reproducible builds are intended.  It is not feature that
> > comes for free and I think most users of PGO does not care, so I think
> > it should be opt in.
> 
> I agree that most users really don't care.
> 
> > 
> > In general getting profile reroducible one needs to make train
> > reproducible that is hard when you look at details (such as/tmp/  file
> > name generation issue in gcc) and may lead to need for user to annotate
> > such code.
> 
> Yes, right now I'm testing both patches and I still see difference in GCC PGO
> bootstrap (with -fprofile-reproducible=parallel-runs):
> 
> $ objfolderdiff.py /dev/shm/objdir2/gcc /dev/shm/objdir3/gcc
>138/   649: cgraphunit.o: different
>230/   649: dwarf2out.o: different
>356/   649: ipa-cp.o: different
>357/   649: ipa-devirt.o: different
>360/   649: ipa-icf.o: different
>533/   649: tree-affine.o: different
>574/   649: tree-ssa-loop-im.o: different
>632/   649: var-tracking.o: different
> 
> Most of the changes are in known contexts:
> 
> ;; Function hash_table::hash_entry, false, 
> xcallocator>::find_with_hash 
> (_ZN10hash_tableIN8hash_mapIP10im_mem_refP6sm_aux21simple_hashmap_traitsI19default_hash_traitsIS2_ES4_EE10hash_entryELb0E11xcallocatorE14find_with_hashERKS2_j,
>  funcdef_no=3431, decl_uid=106116, cgraph_uid=2554, symbol_order=2712)
> 
> ;; Function hash_table, false, 
> xcallocator>::find_empty_slot_for_expand 
> (_ZN10hash_tableI19default_hash_traitsIPvELb0E11xcallocatorE26find_empty_slot_for_expandEj,
>  funcdef_no=4875, decl_uid=134656, cgraph_uid=3901, symbol_order=4074)
> 
> ;; Function hash_table_mod2 (_Z15hash_table_mod2jj, funcdef_no=1047, 
> decl_uid=32691, cgraph_uid=345, symbol_order=356)
> 
> ;; Function hash_table::hash_entry, 
> false, xcallocator>::find_empty_slot_for_expand 
> (_ZN10hash_tableIN8hash_mapIP9tree_nodeP14name_expansion21simple_hashmap_traitsI19default_hash_traitsIS2_ES4_EE10hash_entryELb0E11xcallocatorE26find_empty_slot_for_expandEj,
>  funcdef_no=3639, decl_uid=102478, cgraph_uid=2760, symbol_order=2916)
> 
> So likely hashing related functions where we hash some pointers :/ 
> Unfortunately, that's enough for final binary
> divergence.

Yep, this is really anoying (and it shows how hard full reproducibility
is). I think we could try to disable profiling for the hash functions
and see if we can get reproducibility right for GCC...

Honza
> 
> > 
> > This will become more common problem for multithreaded profiles where
> > one needs to annotate locking and busy waiting loops in them for example
> > (or the scheduler responsible for executing paralle tasks).
> > 
> > I can see this to be practically achievable but we probably want to
> > produce some guidelines for doing that and probably teach gcov-tool to
> > compare profiles and say to which degree they match (i.e. which
> > functions match for each of levels of reproducibility).
> > 
> > The problem is that profiles are continuous and the errors too, but
> > optimizaitons looks for certain thresholds, so small errors may lead to
> > code changes, so I think our current method of looking at relatively few
> > packages and patching errors when they appear is not very good long term
> > strategy... Especially if it makes us to drop useful transformations by
> > default with -fprofile-use and no additional option.
> 
> To be honest we have very few packages that use PGO in openSUSE:Factory.
> 
> Anyway, are you fine with the suggested?
> 
> Thanks for discussion,
> Martin

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka

> On 1/21/21 8:03 PM, Jan Hubicka wrote:
> > What exactly is suggested?
> 
> This one.
> 
> Martin

> From 22bbf5342f2b73fad6c0286541bba6699c617380 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Thu, 21 Jan 2021 09:02:31 +0100
> Subject: [PATCH 1/2] Restore -fprofile-reproducibility flag.
> 
> gcc/ChangeLog:
> 
>   PR gcov-profile/98739
>   * common.opt: Add missing equal symbol.
>   * value-prof.c (get_nth_most_common_value): Fix comment
>   and skip TOP N counter properly when -fprofile-reproducibility
>   is not serial.
> ---
>  gcc/common.opt   |  2 +-
>  gcc/value-prof.c | 18 --
>  2 files changed, 9 insertions(+), 11 deletions(-)
> 
>  bool
>  get_nth_most_common_value (gimple *stmt, const char *counter_type,
> @@ -765,15 +762,16 @@ get_nth_most_common_value (gimple *stmt, const char 
> *counter_type,
>*count = 0;
>*value = 0;
>  
> -  gcov_type read_all = abs_hwi (hist->hvalue.counters[0]);
> +  gcov_type read_all = hist->hvalue.counters[0];
> +  gcov_type covered = 0;
> +  for (unsigned i = 0; i < counters; ++i)
> +covered += hist->hvalue.counters[2 * i + 3];
>  
>gcov_type v = hist->hvalue.counters[2 * n + 2];
>gcov_type c = hist->hvalue.counters[2 * n + 3];
>  
> -  if (hist->hvalue.counters[0] < 0
> -  && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
> -   || (flag_profile_reproducible
> -   == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
> +  if (read_all != covered
> +  && flag_profile_reproducible != PROFILE_REPRODUCIBILITY_SERIAL)

This should be right for REPRODUCIBILITY_MULTITHREADED but is too strict
for PARALLEL_RUNS (and I think we now have data that this difference
matters).  If you
 1) re-add logic that avoids stremaing targets with no more than 1/32 of
 overall execution counts from each run
 (we may want to have way to tweak the threshold, but I guess we may
 want to first see if it is necessary since it is easy to add and we
 already have bit too many environment variables)
 2) re-add logic tracking if any values was lost during merging
 using the sign of first counter
 3) make PARALLEL_RUNS to disregard the profile if the first counter is
 negetaive
We sould be able to track most of cases where number of values exceeds
32 but there is one or two really dominating.

Also I think it makes sense to default to =serial and use the new flag
in the few packages where we do profile feedback and care about
reproducibility.

Thanks a lot for looking into this!
Honza
>  return false;
>  
>/* Indirect calls can't be verified.  */
> -- 
> 2.30.0
>

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Martin Liška


On 1/21/21 8:02 PM, Jan Hubicka wrote:

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index b401f0817a3..042c03d819e 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1961,8 +1961,9 @@ expand_all_functions (void)
}
  
/* First output functions with time profile in specified order.  */

-  qsort (tp_first_run_order, tp_first_run_order_pos,
-sizeof (cgraph_node *), tp_first_run_node_cmp);
+  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
+qsort (tp_first_run_order, tp_first_run_order_pos,
+  sizeof (cgraph_node *), tp_first_run_node_cmp);




You are right, it should be fine for a safe parallel-run profiling training.


This you need to check eariler in

   for (i = 0; i < order_pos; i++)
 if (order[i]->process)
   {
 if (order[i]->tp_first_run
 && opt_for_fn (order[i]->decl, flag_profile_reorder_functions))
^ here
and check only for REPRODUCIBILITY_MULTITHREADED.  We probably also want
to document this.

However easier fix is to simply clear tp_first_run at profile read time


Yes, it will be a better place!

Martin


if we do multithreaded reproducibility instead of attaching it and
ignoring later.  This will make both places you modified to do the right
thing.

Honza



for (i = 0; i < tp_first_run_order_pos; i++)
  {
node = tp_first_run_order[i];
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 15761ac9eb5..f9e632776e6 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int 
max_partition_size)
   unit tends to import a lot of global trees defined there.  We should
   get better about minimizing the function bounday, but until that
   things works smoother if we order in source order.  */
-  order.qsort (tp_first_run_node_cmp);
+  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
+order.qsort (tp_first_run_node_cmp);
noreorder.qsort (node_cmp);
  
if (dump_file)

--
2.30.0

Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-01-21 Thread Jan Hubicka

> On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > >  wrote:
> > > > gimple.h has this comment for gimple's uid field:
> > > > 
> > > >   /* UID of this statement.  This is used by passes that want to
> > > >  assign IDs to statements.  It must be assigned and used by
> > > > each
> > > >  pass.  By default it should be assumed to contain
> > > > garbage.  */
> > > >   unsigned uid;
> > > > 
> > > > and gimple_set_uid has:
> > > > 
> > > >Please note that this UID property is supposed to be undefined
> > > > at
> > > >pass boundaries.  This means that a given pass should not
> > > > assume it
> > > >contains any useful value when the pass starts and thus can
> > > > set it
> > > >to any value it sees fit.
> > > > 
> > > > which suggests that any pass can use the uid field as an
> > > > arbitrary
> > > > scratch space.
> > > > 
> > > > PR analyzer/98599 reports a case where this error occurs in LTO
> > > > mode:
> > > >   fatal error: Cgraph edge statement index out of range
> > > > on certain inputs with -fanalyzer.
> > > > 
> > > > The error occurs in the LTRANS phase after -fanalyzer runs in the
> > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > stmts.
> > > > 
> > > > The error occurs when LTRANS is streaming callgraph edges back
> > > > in.
> > > > If I'm reading things correctly, the LTO format uses stmt uids to
> > > > associate call stmts with callgraph edges between WPA and LTRANS.
> > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > gimple_uid, and input_edge reads it back in.
> > > > 
> > > > Hence IPA passes that touch the uids in WPA need to restore them,
> > > > or the stream-in at LTRANS will fail.
> > > > 
> > > > Is it intended that the LTO machinery relies on the value of the
> > > > uid
> > > > field being preserved during WPA (or, at least, needs to be saved
> > > > and
> > > > restored by passes that touch it)?
> > > 
> > > I belive this is solely at the cgraph stream out to stream in
> > > boundary but
> > > this may be a blurred area since while we materialize the whole
> > > cgraph
> > > at once the function bodies are streamed in on demand.
> > > 
> > > Honza can probably clarify things.
> > 
> > Well, the uids are used to associate cgraph edges with
> > statements.  At
> > WPA stage you do not have function bodies and thus uids serves role
> > of
> > pointers to the statement.  If you load the body in (via get_body)
> > the
> > uids are replaced by pointers and when you stream out uids are
> > recomputed again.
> > 
> > When do you touch the uids? At WPA time or from small IPA pass in
> > ltrans?
> 
> The analyzer is here in passes.def:
>   INSERT_PASSES_AFTER (all_regular_ipa_passes)
>   NEXT_PASS (pass_analyzer);
> 
> and so in LTO runs as the first regular IPA pass at WPA time,
> when do_whole_program_analysis calls:
>   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> 
> FWIW I hope to eventually have a way to summarize function bodies in
> the analyzer, but I don't yet, so I'm currently brute forcing things by
> loading all function bodies at the start of the analyzer (when
> -fanalyzer is enabled).
> 
> I wonder if that's messing things up somehow?

Actually I think it should work.  If you do get_body or
get_untransformed_body (that will be equal at that time) you ought to
get ids in symtab datastructure rewritten to pointers and at stream out
time we should assign new ids...
> 
> Does the stream-out from WPA make any assumptions about the stmt uids?
> For example, 
>   #define STMT_UID_NOT_IN_RANGE(uid) \
> (gimple_stmt_max_uid (fn) < uid || uid == 0)
> seems to assume that the UIDs are per-function ranges from
>   [0-gimple_stmt_max_uid (fn)]
> which isn't the case for the uids set by the analyzer.  Maybe that's
> the issue here?
> 
> Sorry for not being more familiar with the IPA/LTO code

There is lto_prepare_function_for_streaming which assigns uids to be
incremental.   So I guess problem is that it is not called at WPA time
if function is in memory (since at moment we do not really modify bodies
at WPA time, however we do stream them in sometimes to icf compare them
or to update profile).

So i guess fix would be to arrange lto_prepare_function_for_streaming to
be called on all functions with body defined before WPA stream-out?

Honza
> Dave
> 
> 
> > hozna
> > > Note LTO uses this exactly because of this comment to avoid
> > > allocating
> > > extra memory for an 'index' but it could of course leave gimple_uid
> > > alone
> > > at some extra expense (eventually paid for in generic cgraph data
> > > structures
> > > and thus for not only the streaming time).
> > > 
> > > > On the assumption that this is the case, this patch updates the
> > > > comments
> > > > in gimple.h referring to passes being able to set uid to any
> > > > value to
> > > > note the caveat for IPA passes, and it updates the an

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Martin Liška


On 1/21/21 8:03 PM, Jan Hubicka wrote:

What exactly is suggested?


This one.

Martin
>From 22bbf5342f2b73fad6c0286541bba6699c617380 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 21 Jan 2021 09:02:31 +0100
Subject: [PATCH 1/2] Restore -fprofile-reproducibility flag.

gcc/ChangeLog:

	PR gcov-profile/98739
	* common.opt: Add missing equal symbol.
	* value-prof.c (get_nth_most_common_value): Fix comment
	and skip TOP N counter properly when -fprofile-reproducibility
	is not serial.
---
 gcc/common.opt   |  2 +-
 gcc/value-prof.c | 18 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index bde1711870d..a8a2b67a99d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2248,7 +2248,7 @@ Enum(profile_reproducibility) String(parallel-runs) Value(PROFILE_REPRODUCIBILIT
 EnumValue
 Enum(profile_reproducibility) String(multithreaded) Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
 
-fprofile-reproducible
+fprofile-reproducible=
 Common Joined RejectNegative Var(flag_profile_reproducible) Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
 -fprofile-reproducible=[serial|parallel-runs|multithreaded]	Control level of reproducibility of profile gathered by -fprofile-generate.
 
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 4c916f4994f..fafe9d8d0f1 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -747,11 +747,8 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob,
 
abs (counters[0]) is the number of executions
for i in 0 ... TOPN-1
- counters[2 * i + 1] is target
- abs (counters[2 * i + 2]) is corresponding hitrate counter.
-
-   Value of counters[0] negative when counter became
-   full during merging and some values are lost.  */
+ counters[2 * i + 2] is target
+ counters[2 * i + 3] is corresponding hitrate counter.  */
 
 bool
 get_nth_most_common_value (gimple *stmt, const char *counter_type,
@@ -765,15 +762,16 @@ get_nth_most_common_value (gimple *stmt, const char *counter_type,
   *count = 0;
   *value = 0;
 
-  gcov_type read_all = abs_hwi (hist->hvalue.counters[0]);
+  gcov_type read_all = hist->hvalue.counters[0];
+  gcov_type covered = 0;
+  for (unsigned i = 0; i < counters; ++i)
+covered += hist->hvalue.counters[2 * i + 3];
 
   gcov_type v = hist->hvalue.counters[2 * n + 2];
   gcov_type c = hist->hvalue.counters[2 * n + 3];
 
-  if (hist->hvalue.counters[0] < 0
-  && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
-	  || (flag_profile_reproducible
-	  == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
+  if (read_all != covered
+  && flag_profile_reproducible != PROFILE_REPRODUCIBILITY_SERIAL)
 return false;
 
   /* Indirect calls can't be verified.  */
-- 
2.30.0

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka

> > This will become more common problem for multithreaded profiles where
> > one needs to annotate locking and busy waiting loops in them for example
> > (or the scheduler responsible for executing paralle tasks).
> > 
> > I can see this to be practically achievable but we probably want to
> > produce some guidelines for doing that and probably teach gcov-tool to
> > compare profiles and say to which degree they match (i.e. which
> > functions match for each of levels of reproducibility).
> > 
> > The problem is that profiles are continuous and the errors too, but
> > optimizaitons looks for certain thresholds, so small errors may lead to
> > code changes, so I think our current method of looking at relatively few
> > packages and patching errors when they appear is not very good long term
> > strategy... Especially if it makes us to drop useful transformations by
> > default with -fprofile-use and no additional option.
> 
> To be honest we have very few packages that use PGO in openSUSE:Factory.

We should aim to have more :)
> 
> Anyway, are you fine with the suggested?

What exactly is suggested?
Honza
> 
> Thanks for discussion,
> Martin

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka

> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index b401f0817a3..042c03d819e 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -1961,8 +1961,9 @@ expand_all_functions (void)
>}
>  
>/* First output functions with time profile in specified order.  */
> -  qsort (tp_first_run_order, tp_first_run_order_pos,
> -  sizeof (cgraph_node *), tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +qsort (tp_first_run_order, tp_first_run_order_pos,
> +sizeof (cgraph_node *), tp_first_run_node_cmp);

This you need to check eariler in

  for (i = 0; i < order_pos; i++)   
if (order[i]->process)  
  { 
if (order[i]->tp_first_run  
&& opt_for_fn (order[i]->decl, flag_profile_reorder_functions)) 
^ here
and check only for REPRODUCIBILITY_MULTITHREADED.  We probably also want
to document this.

However easier fix is to simply clear tp_first_run at profile read time
if we do multithreaded reproducibility instead of attaching it and
ignoring later.  This will make both places you modified to do the right
thing.

Honza


>for (i = 0; i < tp_first_run_order_pos; i++)
>  {
>node = tp_first_run_order[i];
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 15761ac9eb5..f9e632776e6 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int 
> max_partition_size)
>   unit tends to import a lot of global trees defined there.  We should
>   get better about minimizing the function bounday, but until that
>   things works smoother if we order in source order.  */
> -  order.qsort (tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +order.qsort (tp_first_run_node_cmp);
>noreorder.qsort (node_cmp);
>  
>if (dump_file)
> -- 
> 2.30.0
>

Re: [PATCH] improve warning suppression for inlined functions (PR 98465, 98512)

2021-01-21 Thread Florian Weimer via Gcc-patches

* Martin Sebor:

> On 1/21/21 10:34 AM, Florian Weimer wrote:
>> * Martin Sebor via Gcc-patches:
>> 
>>> This patch depends on the fix for PR 98664 (already approved but
>>> not yet checked in).  I've tested it on x86_64-linux.
>>>
>>> To avoid fallout I tried to keep the changes to a minimum, and
>>> so the design isn't as robust as I'd like it ultimately to be.
>>> I plan to enhance it in stage 1.
>> I've tested this patch on top of 43705f3fa343e08b2fb030460f (so with
>> the
>> PR98664 fix, I think), and the reproducer from PR98512 now ICEs:
>
> Thanks for giving it a try!  I saw a similar ICE during my testing
> -- it's caused by a couple of uninitialized variables.  I fixed
> it in my tree (see below) but the fix didn't make it into the patch.
>
> Please give this a try and let me know if it doesn't help:
>
> index abcd991b829..d82a7eb67e5 100644
> --- a/gcc/diagnostic.c
> +++ b/gcc/diagnostic.c
> @@ -1426,7 +1426,7 @@ diagnostic_impl (rich_location *richloc, const
> diagnostic_metadata *metadata,
>  int opt, const char *gmsgid,
>  va_list *ap, diagnostic_t kind)
>  {
> -  diagnostic_info diagnostic;
> +  diagnostic_info diagnostic{ };
>if (kind == DK_PERMERROR)
>  {
>diagnostic_set_info (&diagnostic, gmsgid, ap, richloc,
> @@ -1452,7 +1452,7 @@ diagnostic_n_impl (rich_location *richloc, const
> diagnostic_metadata *metadata,
>const char *plural_gmsgid,
>va_list *ap, diagnostic_t kind)
>  {
> -  diagnostic_info diagnostic;
> +  diagnostic_info diagnostic{ };
>unsigned long gtn;
>
>if (sizeof n <= sizeof gtn)

This fixes the crash for me, and the warnings is gone as well.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

[PATCH]Arm: Add NEON and MVE complex mul, mla and mls patterns.

2021-01-21 Thread Tamar Christina via Gcc-patches

Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void g (float complex a[restrict N], float complex b[restrict N],
  float complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] =  a[i] * b[i];
  }

generates


NEON:

g:
vmov.f32q11, #0.0  @ v4sf
add r3, r2, #1600
.L2:
vmovq8, q11  @ v4sf
vld1.32 {q10}, [r1]!
vld1.32 {q9}, [r0]!
vcmla.f32   q8, q9, q10, #0
vcmla.f32   q8, q9, q10, #90
vst1.32 {q8}, [r2]!
cmp r3, r2
bne .L2
bx  lr

MVE:

g:
push{lr}
mov lr, #100
dls lr, lr
.L2:
vldrw.32q1, [r1], #16
vldrw.32q2, [r0], #16
vcmul.f32   q3, q2, q1, #0
vcmla.f32   q3, q2, q1, #90
vstrw.32q3, [r2], #16
le  lr, .L2
ldr pc, [sp], #4

instead of

g:
add r3, r2, #1600
.L2:
vld2.32 {d20-d23}, [r0]!
vld2.32 {d16-d19}, [r1]!
vmul.f32q14, q11, q9
vmul.f32q15, q11, q8
vneg.f32q14, q14
vfma.f32q15, q10, q9
vfma.f32q14, q10, q8
vmovq13, q15  @ v4sf
vmovq12, q14  @ v4sf
vst2.32 {d24-d27}, [r2]!
cmp r3, r2
bne .L2
bx  lr

and

g:
add r3, r2, #1600
.L2:
vld2.32 {d20-d23}, [r0]!
vld2.32 {d16-d19}, [r1]!
vmul.f32q15, q10, q8
vmul.f32q14, q10, q9
vmls.f32q15, q11, q9
vmla.f32q14, q11, q8
vmovq12, q15  @ v4sf
vmovq13, q14  @ v4sf
vst2.32 {d24-d27}, [r2]!
cmp r3, r2
bne .L2
bx  lr

respectively.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Execution tests verified with QEMU.

Generic tests for these are in the mid-end and I will enable them with a
different patch.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/iterators.md (rotsplit1, rotsplit2, conj_op, fcmac1,
VCMLA_OP, VCMUL_OP): New.
* config/arm/mve.md (mve_vcmlaq): Support vec_dup 0.
* config/arm/neon.md (cmul3): New.
* config/arm/unspecs.md (UNSPEC_VCMLA_CONJ, UNSPEC_VCMLA180_CONJ,
UNSPEC_VCMUL_CONJ): New.
* config/arm/vec-common.md (cmul3, arm_vcmla,
cml4): New.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 
2e0aacbd3f742538073e441b53fcffc45e37c790..b9027905307fe19d60d164cef23dac6ab119cd9b
 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1186,6 +1186,33 @@ (define_int_attr rot [(UNSPEC_VCADD90 "90")
  (UNSPEC_VCMLA180 "180")
  (UNSPEC_VCMLA270 "270")])
 
+;; The complex operations when performed on a real complex number require two
+;; instructions to perform the operation. e.g. complex multiplication requires
+;; two VCMUL with a particular rotation value.
+;;
+;; These values can be looked up in rotsplit1 and rotsplit2.  as an example
+;; VCMUL needs the first instruction to use #0 and the second #90.
+(define_int_attr rotsplit1 [(UNSPEC_VCMLA "0")
+   (UNSPEC_VCMLA_CONJ "0")
+   (UNSPEC_VCMUL "0")
+   (UNSPEC_VCMUL_CONJ "0")
+   (UNSPEC_VCMLA180 "180")
+   (UNSPEC_VCMLA180_CONJ "180")])
+
+(define_int_attr rotsplit2 [(UNSPEC_VCMLA "90")
+   (UNSPEC_VCMLA_CONJ "270")
+   (UNSPEC_VCMUL "90")
+   (UNSPEC_VCMUL_CONJ "270")
+   (UNSPEC_VCMLA180 "270")
+   (UNSPEC_VCMLA180_CONJ "90")])
+
+(define_int_attr conj_op [(UNSPEC_VCMLA180 "")
+ (UNSPEC_VCMLA180_CONJ "_conj")
+ (UNSPEC_VCMLA "")
+ (UNSPEC_VCMLA_CONJ "_conj")
+ (UNSPEC_VCMUL "")
+ (UNSPEC_VCMUL_CONJ "_conj")])
+
 (define_int_attr mve_rot [(UNSPEC_VCADD90 "_rot90")
  (UNSPEC_VCADD270 "_rot270")
  (UNSPEC_VCMLA "")
@@ -1200,6 +1227,9 @@ (define_int_attr mve_rot [(UNSPEC_VCADD90 "_rot90")
 (define_int_iterator VCMUL [UNSPEC_VCMUL UNSPEC_VCMUL90
UNSPEC_VCMUL180 UNSPEC_VCMUL270])
 
+(define_int_attr fcmac1 [(UNSPEC_VCMLA "a") (UNSPEC_VCMLA_CONJ "a")
+(UNSPEC_VCMLA180 "s") (UNSPEC_VCMLA180_CONJ "s")])
+
 (define_int_attr simd32_op [(UNSPEC_QADD8 "qadd8") (UNSPEC_QSUB8 "qsub8")
(UNSPEC_SHADD8 "shadd8") (UNSPEC_SHSUB8 "shsub8")
(UNSPEC_UHADD8 "uhadd8") (UNSPEC_UHSUB8 "uhsub8")
@@ -1723,3 +1753,13 @@ (define_int_iterator VADCQ_M [VADC

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka

> On 1/21/21 7:45 PM, Jan Hubicka wrote:
> > For this reason we merge by computing average, which is stable over
> > reordering the indices
> 
> Looking at the implementation, we merge by using minimum value:
> 
> /* Time profiles are merged so that minimum from all valid (greater than zero)
>is stored. There could be a fork that creates new counters. To have
>the profile stable, we chosen to pick the smallest function visit time.  */

Yep, sorry for confussion.  I just noticed that as well.
Minimum should be still safe for parallel-run profiling (not for
multithreaded where we probably really want to disabe it, but we can do
that on per-function basis using opt_for_fn so it works with LTO).

Honza
> void
> __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
> {
>   unsigned int i;
>   gcov_type value;
> 
>   for (i = 0; i < n_counters; i++)
> {
>   value = gcov_get_counter_target ();
> 
>   if (value && (!counters[i] || value < counters[i]))
> counters[i] = value;
> }
> }
> #endif /* L_gcov_merge_time_profile */
> 
> Martin

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Martin Liška


On 1/21/21 7:45 PM, Jan Hubicka wrote:

For this reason we merge by computing average, which is stable over
reordering the indices


Looking at the implementation, we merge by using minimum value:

/* Time profiles are merged so that minimum from all valid (greater than zero)
   is stored. There could be a fork that creates new counters. To have
   the profile stable, we chosen to pick the smallest function visit time.  */
void
__gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
{
  unsigned int i;
  gcov_type value;

  for (i = 0; i < n_counters; i++)
{
  value = gcov_get_counter_target ();

  if (value && (!counters[i] || value < counters[i]))
counters[i] = value;
}
}
#endif /* L_gcov_merge_time_profile */

Martin

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka

> On 1/21/21 3:01 PM, Jan Hubicka wrote:
> > > 
> > > Plus I'm planning to send one more patch that will ignore time profile 
> > > when -fprofile-reproduce != serial.
> > 
> > Why you need to disable time profiling?
> 
> Because you can have 2 training runs (running in parallel) when order is:
> runA: foo -> bar
> runB: bar -> foo
> 
> Then based on order of profile merging you get a final output.

For this reason we merge by computing average, which is stable over
reordering the indices

Honza
> 
> I would like to address it with the attached patch.
> 
> Martin
> 
> > 
> > Honza
> > 
> 

> From fb4bc6f4b4b106d38fbf710f87e128d26fc1b988 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Thu, 21 Jan 2021 09:22:45 +0100
> Subject: [PATCH 2/2] Consider time profilers only when
>  -fprofile-reproducible=serial.
> 
> gcc/ChangeLog:
> 
>   PR gcov-profile/98739
>   * cgraphunit.c (expand_all_functions): Consider tp_first_run
>   only when -fprofile-reproducible=serial.
> 
> gcc/lto/ChangeLog:
> 
>   PR gcov-profile/98739
>   * lto-partition.c (lto_balanced_map): Consider tp_first_run
>   only when -fprofile-reproducible=serial.
> ---
>  gcc/cgraphunit.c| 5 +++--
>  gcc/lto/lto-partition.c | 3 ++-
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index b401f0817a3..042c03d819e 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -1961,8 +1961,9 @@ expand_all_functions (void)
>}
>  
>/* First output functions with time profile in specified order.  */
> -  qsort (tp_first_run_order, tp_first_run_order_pos,
> -  sizeof (cgraph_node *), tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +qsort (tp_first_run_order, tp_first_run_order_pos,
> +sizeof (cgraph_node *), tp_first_run_node_cmp);
>for (i = 0; i < tp_first_run_order_pos; i++)
>  {
>node = tp_first_run_order[i];
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 15761ac9eb5..f9e632776e6 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int 
> max_partition_size)
>   unit tends to import a lot of global trees defined there.  We should
>   get better about minimizing the function bounday, but until that
>   things works smoother if we order in source order.  */
> -  order.qsort (tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +order.qsort (tp_first_run_node_cmp);
>noreorder.qsort (node_cmp);
>  
>if (dump_file)
> -- 
> 2.30.0
>

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Martin Liška


On 1/21/21 2:46 PM, Jan Hubicka wrote:

I think easy way to get users of this option is to make profile not
reproducible by default and modify packages to use right reproducibility
option when reproducible builds are intended.  It is not feature that
comes for free and I think most users of PGO does not care, so I think
it should be opt in.


I agree that most users really don't care.



In general getting profile reroducible one needs to make train
reproducible that is hard when you look at details (such as/tmp/  file
name generation issue in gcc) and may lead to need for user to annotate
such code.


Yes, right now I'm testing both patches and I still see difference in GCC PGO
bootstrap (with -fprofile-reproducible=parallel-runs):

$ objfolderdiff.py /dev/shm/objdir2/gcc /dev/shm/objdir3/gcc
   138/   649: cgraphunit.o: different
   230/   649: dwarf2out.o: different
   356/   649: ipa-cp.o: different
   357/   649: ipa-devirt.o: different
   360/   649: ipa-icf.o: different
   533/   649: tree-affine.o: different
   574/   649: tree-ssa-loop-im.o: different
   632/   649: var-tracking.o: different

Most of the changes are in known contexts:

;; Function hash_table::hash_entry, false, 
xcallocator>::find_with_hash 
(_ZN10hash_tableIN8hash_mapIP10im_mem_refP6sm_aux21simple_hashmap_traitsI19default_hash_traitsIS2_ES4_EE10hash_entryELb0E11xcallocatorE14find_with_hashERKS2_j,
 funcdef_no=3431, decl_uid=106116, cgraph_uid=2554, symbol_order=2712)

;; Function hash_table, false, 
xcallocator>::find_empty_slot_for_expand 
(_ZN10hash_tableI19default_hash_traitsIPvELb0E11xcallocatorE26find_empty_slot_for_expandEj, 
funcdef_no=4875, decl_uid=134656, cgraph_uid=3901, symbol_order=4074)

;; Function hash_table_mod2 (_Z15hash_table_mod2jj, funcdef_no=1047, 
decl_uid=32691, cgraph_uid=345, symbol_order=356)

;; Function hash_table::hash_entry, false, 
xcallocator>::find_empty_slot_for_expand 
(_ZN10hash_tableIN8hash_mapIP9tree_nodeP14name_expansion21simple_hashmap_traitsI19default_hash_traitsIS2_ES4_EE10hash_entryELb0E11xcallocatorE26find_empty_slot_for_expandEj,
 funcdef_no=3639, decl_uid=102478, cgraph_uid=2760, symbol_order=2916)

So likely hashing related functions where we hash some pointers :/ 
Unfortunately, that's enough for final binary
divergence.



This will become more common problem for multithreaded profiles where
one needs to annotate locking and busy waiting loops in them for example
(or the scheduler responsible for executing paralle tasks).

I can see this to be practically achievable but we probably want to
produce some guidelines for doing that and probably teach gcov-tool to
compare profiles and say to which degree they match (i.e. which
functions match for each of levels of reproducibility).

The problem is that profiles are continuous and the errors too, but
optimizaitons looks for certain thresholds, so small errors may lead to
code changes, so I think our current method of looking at relatively few
packages and patching errors when they appear is not very good long term
strategy... Especially if it makes us to drop useful transformations by
default with -fprofile-use and no additional option.


To be honest we have very few packages that use PGO in openSUSE:Factory.

Anyway, are you fine with the suggested?

Thanks for discussion,
Martin

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Martin Liška


On 1/21/21 3:01 PM, Jan Hubicka wrote:


Plus I'm planning to send one more patch that will ignore time profile when 
-fprofile-reproduce != serial.


Why you need to disable time profiling?


Because you can have 2 training runs (running in parallel) when order is:
runA: foo -> bar
runB: bar -> foo

Then based on order of profile merging you get a final output.

I would like to address it with the attached patch.

Martin



Honza



>From fb4bc6f4b4b106d38fbf710f87e128d26fc1b988 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 21 Jan 2021 09:22:45 +0100
Subject: [PATCH 2/2] Consider time profilers only when
 -fprofile-reproducible=serial.

gcc/ChangeLog:

	PR gcov-profile/98739
	* cgraphunit.c (expand_all_functions): Consider tp_first_run
	only when -fprofile-reproducible=serial.

gcc/lto/ChangeLog:

	PR gcov-profile/98739
	* lto-partition.c (lto_balanced_map): Consider tp_first_run
	only when -fprofile-reproducible=serial.
---
 gcc/cgraphunit.c| 5 +++--
 gcc/lto/lto-partition.c | 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index b401f0817a3..042c03d819e 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1961,8 +1961,9 @@ expand_all_functions (void)
   }
 
   /* First output functions with time profile in specified order.  */
-  qsort (tp_first_run_order, tp_first_run_order_pos,
-	 sizeof (cgraph_node *), tp_first_run_node_cmp);
+  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
+qsort (tp_first_run_order, tp_first_run_order_pos,
+	   sizeof (cgraph_node *), tp_first_run_node_cmp);
   for (i = 0; i < tp_first_run_order_pos; i++)
 {
   node = tp_first_run_order[i];
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 15761ac9eb5..f9e632776e6 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int max_partition_size)
  unit tends to import a lot of global trees defined there.  We should
  get better about minimizing the function bounday, but until that
  things works smoother if we order in source order.  */
-  order.qsort (tp_first_run_node_cmp);
+  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
+order.qsort (tp_first_run_node_cmp);
   noreorder.qsort (node_cmp);
 
   if (dump_file)
-- 
2.30.0

Re: [PATCH] improve warning suppression for inlined functions (PR 98465, 98512)

2021-01-21 Thread Martin Sebor via Gcc-patches


On 1/21/21 10:34 AM, Florian Weimer wrote:

* Martin Sebor via Gcc-patches:


This patch depends on the fix for PR 98664 (already approved but
not yet checked in).  I've tested it on x86_64-linux.

To avoid fallout I tried to keep the changes to a minimum, and
so the design isn't as robust as I'd like it ultimately to be.
I plan to enhance it in stage 1.


I've tested this patch on top of 43705f3fa343e08b2fb030460f (so with the
PR98664 fix, I think), and the reproducer from PR98512 now ICEs:


Thanks for giving it a try!  I saw a similar ICE during my testing
-- it's caused by a couple of uninitialized variables.  I fixed
it in my tree (see below) but the fix didn't make it into the patch.

Please give this a try and let me know if it doesn't help:

index abcd991b829..d82a7eb67e5 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -1426,7 +1426,7 @@ diagnostic_impl (rich_location *richloc, const 
diagnostic_metadata *metadata,

 int opt, const char *gmsgid,
 va_list *ap, diagnostic_t kind)
 {
-  diagnostic_info diagnostic;
+  diagnostic_info diagnostic{ };
   if (kind == DK_PERMERROR)
 {
   diagnostic_set_info (&diagnostic, gmsgid, ap, richloc,
@@ -1452,7 +1452,7 @@ diagnostic_n_impl (rich_location *richloc, const 
diagnostic_metadata *metadata,

   const char *plural_gmsgid,
   va_list *ap, diagnostic_t kind)
 {
-  diagnostic_info diagnostic;
+  diagnostic_info diagnostic{ };
   unsigned long gtn;

   if (sizeof n <= sizeof gtn)

Martin



void *
__rawmemchr_ppc (const void *s, int c)
{
#pragma GCC diagnostics push
#pragma GCC diagnostic ignored "-Wstringop-overflow="
#pragma GCC diagnostic ignored "-Wstringop-overread"
   if (c != 0)
 return __builtin_memchr (s, c, (unsigned long)-1);
#pragma GCC diagnostics pop
   return (char *)s + __builtin_strlen (s);
}
extern __typeof (__rawmemchr_ppc) __EI___rawmemchr_ppc
   __attribute__((alias ("__rawmemchr_ppc")));

during RTL pass: expand
t.c: In function ‘__rawmemchr_ppc’:
t.c:8:12: internal compiler error: Segmentation fault
 8 | return __builtin_memchr (s, c, (unsigned long)-1);
   |^~
0xde134f crash_signal
 /home/bmg/src/gcc/gcc/toplev.c:327
0x9181bd diag_inlining_context::set_locations(vec*, diagnostic_info*)
 /home/bmg/src/gcc/gcc/builtins.c:835
0x17ce2da update_effective_level_from_pragmas
 /home/bmg/src/gcc/gcc/diagnostic.c:1028
0x17ce2da diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*)
 /home/bmg/src/gcc/gcc/diagnostic.c:1218
0x17ceb1e diagnostic_impl
 /home/bmg/src/gcc/gcc/diagnostic.c:1443
0x17cf144 warning(diagnostic_metadata::location_context&, int, char const*, ...)
 /home/bmg/src/gcc/gcc/diagnostic.c:1669
0x917ab0 maybe_warn_for_bound
 /home/bmg/src/gcc/gcc/builtins.c:4077
0x927eee maybe_warn_for_bound
 /home/bmg/src/gcc/gcc/builtins.c:4920
0x927eee check_access(tree_node*, tree_node*, tree_node*, tree_node*, 
tree_node*, access_mode, access_data const*)
 /home/bmg/src/gcc/gcc/builtins.c:4918
0x928b52 check_read_access
 /home/bmg/src/gcc/gcc/builtins.c:4996
0x92e19b check_read_access
 /home/bmg/src/gcc/gcc/builtins.c:9992
0x92e19b expand_builtin_memchr
 /home/bmg/src/gcc/gcc/builtins.c:5926
0x92e19b expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
 /home/bmg/src/gcc/gcc/builtins.c:9992
0xa6d714 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
 /home/bmg/src/gcc/gcc/expr.c:11275
0xa796fb store_expr(tree_node*, rtx_def*, int, bool, bool)
 /home/bmg/src/gcc/gcc/expr.c:5885
0xa7ab71 expand_assignment(tree_node*, tree_node*, bool)
 /home/bmg/src/gcc/gcc/expr.c:5621
0x9569bb expand_call_stmt
 /home/bmg/src/gcc/gcc/cfgexpand.c:2837
0x9569bb expand_gimple_stmt_1
 /home/bmg/src/gcc/gcc/cfgexpand.c:3843
0x9569bb expand_gimple_stmt
 /home/bmg/src/gcc/gcc/cfgexpand.c:4007
0x95c60f expand_gimple_basic_block
 /home/bmg/src/gcc/gcc/cfgexpand.c:6044

Thanks,
Florian

ping [PATCH] c++: fix string literal member initializer bug [PR90926]

2021-01-21 Thread Tom Greenslade (thomgree) via Gcc-patches

Ping for this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562259.html

-Original Message-
From: Thomas Greenslade (thomgree) 
Sent: 17 December 2020 22:12
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] c++: fix string literal member initializer bug [PR90926]

build_aggr_conv did not correctly handle string literal member initializers. 
Extended can_convert_array to handle this case. The additional checks of 
compatibility of character types, and whether string literal will fit, would be 
quite complicated, so are deferred until the actual conversion takes place.

Testcase added for this.

Bootstrapped/regtested on x86_64-pc-linux-gnu.

gcc/cp/ChangeLog:

PR c++/90926
* call.c (can_convert_array): Extend to handle all valid aggregate
initializers of an array; including by string literals, not just by
brace-init-list.
(build_aggr_conv): Call can_convert_array more often, not just in
brace-init-list case.
* g++.dg/cpp1y/nsdmi-aggr12.C: New test.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c index c2d62e582bf..e4ba31f3f2b 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -887,28 +887,41 @@ strip_standard_conversion (conversion *conv)
   return conv;
 }
 
-/* Subroutine of build_aggr_conv: check whether CTOR, a braced-init-list,
-   is a valid aggregate initializer for array type ATYPE.  */
+/* Subroutine of build_aggr_conv: check whether FROM is a valid aggregate
+   initializer for array type ATYPE.  */
 
 static bool
-can_convert_array (tree atype, tree ctor, int flags, tsubst_flags_t complain)
+can_convert_array (tree atype, tree from, int flags, tsubst_flags_t 
+complain)
 {
-  unsigned i;
   tree elttype = TREE_TYPE (atype);
-  for (i = 0; i < CONSTRUCTOR_NELTS (ctor); ++i)
+  unsigned i;
+
+  if (TREE_CODE (from) == CONSTRUCTOR)
 {
-  tree val = CONSTRUCTOR_ELT (ctor, i)->value;
-  bool ok;
-  if (TREE_CODE (elttype) == ARRAY_TYPE
- && TREE_CODE (val) == CONSTRUCTOR)
-   ok = can_convert_array (elttype, val, flags, complain);
-  else
-   ok = can_convert_arg (elttype, TREE_TYPE (val), val, flags,
- complain);
-  if (!ok)
-   return false;
+  for (i = 0; i < CONSTRUCTOR_NELTS (from); ++i)
+   {
+ tree val = CONSTRUCTOR_ELT (from, i)->value;
+ bool ok;
+ if (TREE_CODE (elttype) == ARRAY_TYPE)
+   ok = can_convert_array (elttype, val, flags, complain);
+ else
+   ok = can_convert_arg (elttype, TREE_TYPE (val), val, flags,
+ complain);
+ if (!ok)
+   return false;
+   }
+  return true;
 }
-  return true;
+
+  if (   char_type_p (TYPE_MAIN_VARIANT (elttype))
+  && TREE_CODE (tree_strip_any_location_wrapper (from)) == STRING_CST)
+/* Defer the other necessary checks (compatibility of character types and
+   whether string literal will fit) until the conversion actually takes
+   place.  */
+return true;
+
+  /* No other valid way to aggregate initialize an array.  */  return 
+ false;
 }
 
 /* Helper for build_aggr_conv.  Return true if FIELD is in PSET, or if @@ 
-965,8 +978,7 @@ build_aggr_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
  tree ftype = TREE_TYPE (idx);
  bool ok;
 
- if (TREE_CODE (ftype) == ARRAY_TYPE
- && TREE_CODE (val) == CONSTRUCTOR)
+ if (TREE_CODE (ftype) == ARRAY_TYPE)
ok = can_convert_array (ftype, val, flags, complain);
  else
ok = can_convert_arg (ftype, TREE_TYPE (val), val, flags, @@ 
-1013,9 +1025,8 @@ build_aggr_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
  val = empty_ctor;
}
 
-  if (TREE_CODE (ftype) == ARRAY_TYPE
- && TREE_CODE (val) == CONSTRUCTOR)
-   ok = can_convert_array (ftype, val, flags, complain);
+  if (TREE_CODE (ftype) == ARRAY_TYPE)
+   ok = can_convert_array (ftype, val, flags, complain);
   else
ok = can_convert_arg (ftype, TREE_TYPE (val), val, flags,
  complain);
diff --git a/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr12.C 
b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr12.C
new file mode 100644
index 000..ce8c95e8aca
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr12.C
@@ -0,0 +1,21 @@
+// PR c++/90926
+// { dg-do run { target c++14 } }
+
+struct A
+{
+  char str[4] = "foo";
+  char str_array[2][4] = {"bar", "baz"}; };
+
+int
+main ()
+{
+  A a;
+  a.str[0] = 'g';
+  a.str_array[0][0] = 'g';
+  a = {};
+  if (__builtin_strcmp (a.str, "foo") != 0)
+__builtin_abort();
+  if (__builtin_strcmp (a.str_array[0], "bar") != 0)
+__builtin_abort();
+}

Re: driver: do not check input file existence here [PR 98452]

2021-01-21 Thread Joseph Myers

On Wed, 20 Jan 2021, Nathan Sidwell wrote:

> On 1/19/21 6:27 PM, Joseph Myers wrote:
> > On Tue, 19 Jan 2021, Nathan Sidwell wrote:
> > 
> > > Joseph,
> > > I was relying on this patch on the modules branch, but didn't realize the
> > > implications when merging and thought it was just a cleanup.  I'm not sure
> > > why
> > > the driver wants to check here, rather than leave it to the compiler.
> > > Seems
> > > optimizing for failure? The only difference I can think is that the
> > > diagnostic
> > > might mention the driver name, rather than say (cc1plus), but that's a
> > > different problem that I've also reported.
> > 
> > What do the error messages look like, before and after this patch, for the
> > various cases?  (Response file missing; file handled by e.g. cc1plus
> > missing; file handled by the linker missing.)
> 
> here are some experiments:

Thanks.

Mentioning cc1plus is a pre-existing issue that applies in general to 
diagnostics coming from cc1plus and not associated with a given source 
file (as I said in 
, I 
think such diagnostics ought to mention the program the user called, i.e. 
the driver, and not mention cc1plus at all unless the user really called 
it manually without the driver).

Given that, the interesting cases are the ones where an error becomes a 
warning.

> Unspecified non-existent file:
> OLD:
> 1)devvm293:292>g++  -c nothing
> g++: error: nothing: No such file or directory
> g++: fatal error: no input files
> compilation terminated.
> 
> NEW:
> (1)devvm293:293>./xg++  -B./ -c nothing
> xg++: warning: nothing: linker input file unused because linking not done

This seems analogous to e.g. not detecting syntax errors when using -E, or 
not detecting that a file #included inside #if 0 doesn't exist: the 
requested processing doesn't involve doing anything with the file 
"nothing" so it seems appropriate for its absence not to be treated as an 
error.

> Unspecified non-existent response file:
> OLD:
> (1)devvm293:282>g++ -c  @nothing
> g++: error: nothing: No such file or directory
> g++: fatal error: no input files
> compilation terminated.
> 
> NEW:
> (1)devvm293:284>./xg++ -B./ -c @nothing
> xg++: warning: @nothing: linker input file unused because linking not done

This is less clear, in that one might suppose "nothing" is meant to be a 
file listing C++ sources to compile, so should be processed and should 
result in an error for its absence.  However, this particular place in the 
driver handling OPT_SPECIAL_input_file doesn't seem like the right place 
for such an error.  Either the correct handling of response files is that 
the @file argument is silently kept as-is and so the warning after your 
patch is correct, or the correct handling of response files is that there 
should be an error if such a file cannot be found (and so users with 
filenames starting @ must reference them e.g. as ./@file.cc) and 
expandargv should have some way to report that error.  Leaving it to 
OPT_SPECIAL_input_file as in the driver at present would mean such an 
error doesn't occur when @file, unexpanded, appears to be an option 
argument rather than an input file.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [GCC8 backport] AArch64: Fix symbol offset limit (PR 98618)

2021-01-21 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> In aarch64_classify_symbol symbols are allowed large offsets on relocations.
> This means the offset can use all of the +/-4GB offset, leaving no offset
> available for the symbol itself.  This results in relocation overflow and
> link-time errors for simple expressions like &global_array + 0xff00.
>
> To avoid this, unless the offset_within_block_p is true, limit the offset
> to +/-1MB so that the symbol needs to be within a 3.9GB offset from its
> references.  For the tiny code model use a 64KB offset, allowing most of
> the 1MB range for code/data between the symbol and its references.
>
> gcc/
> PR target/98618
> * config/aarch64/aarch64.c (aarch64_classify_symbol):
> Apply reasonable limit to symbol offsets.
>
> gcc/testsuite/
> PR target/98618
> * gcc.target/aarch64/symbol-range.c: Improve testcase.
> * gcc.target/aarch64/symbol-range-tiny.c: Likewise.
>
> (cherry picked from commit 7d3b27ff12610fde9d6c4b56abc70c6ee9b6b3db)

OK on the same basis as GCC9.

Thanks,
Richard

>
> ---
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> e8e73b8ea92b0dd3b9de661652c30c26c07bec86..7c4cf75b5a5e2394dc0b3f69b68d93df8f88111f
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -12011,26 +12011,31 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT 
> offset)
>   the offset does not cause overflow of the final address.  But
>   we have no way of knowing the address of symbol at compile time
>   so we can't accurately say if the distance between the PC and
> - symbol + offset is outside the addressible range of +/-1M in the
> - TINY code model.  So we rely on images not being greater than
> - 1M and cap the offset at 1M and anything beyond 1M will have to
> - be loaded using an alternative mechanism.  Furthermore if the
> - symbol is a weak reference to something that isn't known to
> - resolve to a symbol in this module, then force to memory.  */
> -  if ((SYMBOL_REF_WEAK (x)
> -   && !aarch64_symbol_binds_local_p (x))
> -  || !IN_RANGE (offset, -1048575, 1048575))
> + symbol + offset is outside the addressible range of +/-1MB in the
> + TINY code model.  So we limit the maximum offset to +/-64KB and
> + assume the offset to the symbol is not larger than +/-(1MB - 64KB).
> + If offset_within_block_p is true we allow larger offsets.
> + Furthermore force to memory if the symbol is a weak reference to
> + something that doesn't resolve to a symbol in this module.  */
> +
> +  if (SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x))
>  return SYMBOL_FORCE_TO_MEM;
> +  if (!(IN_RANGE (offset, -0x1, 0x1)
> +|| offset_within_block_p (x, offset)))
> +return SYMBOL_FORCE_TO_MEM;
> +
>return SYMBOL_TINY_ABSOLUTE;
>
>  case AARCH64_CMODEL_SMALL:
>/* Same reasoning as the tiny code model, but the offset cap here is
> - 4G.  */
> -  if ((SYMBOL_REF_WEAK (x)
> -   && !aarch64_symbol_binds_local_p (x))
> -  || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
> -HOST_WIDE_INT_C (4294967264)))
> + 1MB, allowing +/-3.9GB for the offset to the symbol.  */
> +
> +  if (SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x))
>  return SYMBOL_FORCE_TO_MEM;
> +  if (!(IN_RANGE (offset, -0x10, 0x10)
> +|| offset_within_block_p (x, offset)))
> +return SYMBOL_FORCE_TO_MEM;
> +
>return SYMBOL_SMALL_ABSOLUTE;
>
>  case AARCH64_CMODEL_TINY_PIC:
> diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
> b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
> index 
> d7e46b059e41f2672b3a1da5506fa8944e752e01..fc6a4f3ec780d9fa86de1c8e1a42a55992ee8b2d
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
> +++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
> @@ -1,12 +1,12 @@
> -/* { dg-do compile } */
> +/* { dg-do link } */
>  /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */
>
> -int fixed_regs[0x0020];
> +char fixed_regs[0x0008];
>
>  int
> -foo()
> +main ()
>  {
> -  return fixed_regs[0x0008];
> +  return fixed_regs[0x000ff000];
>  }
>
>  /* { dg-final { scan-assembler-not "adr\tx\[0-9\]+, fixed_regs\\\+" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c 
> b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
> index 
> 6574cf4310430b847e77ea56bf8f20ef312d53e4..d8e82fa1b2829fd300b6ccf7f80241e5573e7e17
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c
> +++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
> @@ -1,12 +1,12 @@
> -/* { dg-do compile } */
> +/* { dg-do link } */
>  /* { dg-options "-O3 -save-temps -mcmodel=small" } */
>
> -int fixed_regs[0x2ULL];
> +char fixed_regs[0x8000];
>
>  int
> -foo()
> +main ()
>  {
> -  return fixed_regs[0x1ULL];
> +  return fixed_regs[0xf000];
>  }
>
>  /* { dg-final { scan-assembler-not "adrp\tx\[0-9\]+, fixed_regs\\\+" } } */

Re: [PATCH v3] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Richard Sandiford via Gcc-patches

Ilya Leoshkevich  writes:
> On Thu, 2021-01-21 at 12:29 +, Richard Sandiford wrote:
>> Given what you said in the other message about combine, I agree this
>> is a reasonable workaround.  I don't know whether it's suitable for
>> stage 4 or whether it would need to wait for stage 1.
>
> Thanks for reviewing!  I've implemented your suggestions in the patch
> below.
>
> Regarding stage 4, this can be seen as a part of IBM Z
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html
>
> regression fix - before moving long doubles to vector registers and
> fixing up "f" constraints on RTL level, code generation for small
> glibc functions like __ieee754_sqrtl has been fairly efficient.  Not
> sure if that issue is big enough to justify this common code change at
> this point, but still..

Ah, I'd missed that that patch was a regression fix.  So yeah,
agree it should go in now.

Patch is OK, thanks.

Richard

Re: [PATCH] improve warning suppression for inlined functions (PR 98465, 98512)

2021-01-21 Thread Florian Weimer via Gcc-patches

* Martin Sebor via Gcc-patches:

> This patch depends on the fix for PR 98664 (already approved but
> not yet checked in).  I've tested it on x86_64-linux.
>
> To avoid fallout I tried to keep the changes to a minimum, and
> so the design isn't as robust as I'd like it ultimately to be.
> I plan to enhance it in stage 1.

I've tested this patch on top of 43705f3fa343e08b2fb030460f (so with the
PR98664 fix, I think), and the reproducer from PR98512 now ICEs:

void *
__rawmemchr_ppc (const void *s, int c)
{
#pragma GCC diagnostics push
#pragma GCC diagnostic ignored "-Wstringop-overflow="
#pragma GCC diagnostic ignored "-Wstringop-overread"
  if (c != 0)
return __builtin_memchr (s, c, (unsigned long)-1);
#pragma GCC diagnostics pop
  return (char *)s + __builtin_strlen (s);
}
extern __typeof (__rawmemchr_ppc) __EI___rawmemchr_ppc
  __attribute__((alias ("__rawmemchr_ppc")));

during RTL pass: expand
t.c: In function ‘__rawmemchr_ppc’:
t.c:8:12: internal compiler error: Segmentation fault
8 | return __builtin_memchr (s, c, (unsigned long)-1);
  |^~
0xde134f crash_signal
/home/bmg/src/gcc/gcc/toplev.c:327
0x9181bd diag_inlining_context::set_locations(vec*, diagnostic_info*)
/home/bmg/src/gcc/gcc/builtins.c:835
0x17ce2da update_effective_level_from_pragmas
/home/bmg/src/gcc/gcc/diagnostic.c:1028
0x17ce2da diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*)
/home/bmg/src/gcc/gcc/diagnostic.c:1218
0x17ceb1e diagnostic_impl
/home/bmg/src/gcc/gcc/diagnostic.c:1443
0x17cf144 warning(diagnostic_metadata::location_context&, int, char const*, ...)
/home/bmg/src/gcc/gcc/diagnostic.c:1669
0x917ab0 maybe_warn_for_bound
/home/bmg/src/gcc/gcc/builtins.c:4077
0x927eee maybe_warn_for_bound
/home/bmg/src/gcc/gcc/builtins.c:4920
0x927eee check_access(tree_node*, tree_node*, tree_node*, tree_node*, 
tree_node*, access_mode, access_data const*)
/home/bmg/src/gcc/gcc/builtins.c:4918
0x928b52 check_read_access
/home/bmg/src/gcc/gcc/builtins.c:4996
0x92e19b check_read_access
/home/bmg/src/gcc/gcc/builtins.c:9992
0x92e19b expand_builtin_memchr
/home/bmg/src/gcc/gcc/builtins.c:5926
0x92e19b expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
/home/bmg/src/gcc/gcc/builtins.c:9992
0xa6d714 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/home/bmg/src/gcc/gcc/expr.c:11275
0xa796fb store_expr(tree_node*, rtx_def*, int, bool, bool)
/home/bmg/src/gcc/gcc/expr.c:5885
0xa7ab71 expand_assignment(tree_node*, tree_node*, bool)
/home/bmg/src/gcc/gcc/expr.c:5621
0x9569bb expand_call_stmt
/home/bmg/src/gcc/gcc/cfgexpand.c:2837
0x9569bb expand_gimple_stmt_1
/home/bmg/src/gcc/gcc/cfgexpand.c:3843
0x9569bb expand_gimple_stmt
/home/bmg/src/gcc/gcc/cfgexpand.c:4007
0x95c60f expand_gimple_basic_block
/home/bmg/src/gcc/gcc/cfgexpand.c:6044

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

Re: [PATCH] c++: Suppress this injection for static member functions [PR97399]

2021-01-21 Thread Patrick Palka via Gcc-patches

On Thu, 21 Jan 2021, Patrick Palka wrote:

> Here at parse time finish_qualified_id_expr adds an implicit 'this->' to
> the expression tmp::integral (because it's type-dependent, and also
> current_class_ptr is set) within the trailing return type, and later
> during substitution we can't resolve the 'this' since
> tsubst_function_type does inject_this_parm only for non-static member
> functions which tmp::func is not.
> 
> It seems the root of the problem is that we set current_class_ptr when
> parsing the signature of a static member function.  Since explicit uses
> of 'this' are already not allowed in this context, we probably shouldn't
> be doing inject_this_parm either.
> 
> Bootstrapped and regtested on x64_64-pc-linux-gnu, does this look OK for
> trunk?
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/97399
>   * parser.c (cp_parser_init_declarator): If the storage class
>   specifier is sc_static, pass true for static_p to
>   cp_parser_declarator.
>   (cp_parser_direct_declarator): Don't do inject_this_parm when
>   the member function is static.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/88548
>   PR c++/97399
>   * g++.dg/cpp0x/this2.C: New test.
>   * g++.dg/template/pr97399a.C: New test.
>   * g++.dg/template/pr97399b.C: New test.
> ---
>  gcc/cp/parser.c  |  5 +++--
>  gcc/testsuite/g++.dg/cpp0x/this2.C   |  8 
>  gcc/testsuite/g++.dg/template/pr97399a.C | 11 +++
>  gcc/testsuite/g++.dg/template/pr97399b.C | 11 +++
>  4 files changed, 33 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/this2.C
>  create mode 100644 gcc/testsuite/g++.dg/template/pr97399a.C
>  create mode 100644 gcc/testsuite/g++.dg/template/pr97399b.C
> 
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 48437f23175..18cf9888632 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -21413,6 +21413,7 @@ cp_parser_init_declarator (cp_parser* parser,
>bool is_non_constant_init;
>int ctor_dtor_or_conv_p;
>bool friend_p = cp_parser_friend_p (decl_specifiers);
> +  bool static_p = decl_specifiers->storage_class == sc_static;
>tree pushed_scope = NULL_TREE;
>bool range_for_decl_p = false;
>bool saved_default_arg_ok_p = parser->default_arg_ok_p;
> @@ -21446,7 +21447,7 @@ cp_parser_init_declarator (cp_parser* parser,
>  = cp_parser_declarator (parser, CP_PARSER_DECLARATOR_NAMED,
>   flags, &ctor_dtor_or_conv_p,
>   /*parenthesized_p=*/NULL,
> - member_p, friend_p, /*static_p=*/false);
> + member_p, friend_p, static_p);
>/* Gather up the deferred checks.  */
>stop_deferring_access_checks ();

I should note that the above parser change is needed so that we properly
communicate static-ness to cp_parser_direct_declarator when parsing a
member function template.

>  
> @@ -22122,7 +22123,7 @@ cp_parser_direct_declarator (cp_parser* parser,
>  
> tree save_ccp = current_class_ptr;
> tree save_ccr = current_class_ref;
> -   if (memfn)
> +   if (memfn && !static_p)
>   /* DR 1207: 'this' is in scope after the cv-quals.  */
>   inject_this_parameter (current_class_type, cv_quals);
>  
> diff --git a/gcc/testsuite/g++.dg/cpp0x/this2.C 
> b/gcc/testsuite/g++.dg/cpp0x/this2.C
> new file mode 100644
> index 000..3781bc5efec
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/this2.C
> @@ -0,0 +1,8 @@
> +// PR c++/88548
> +// { dg-do compile { target c++11 } }
> +
> +struct S {
> +  int a;
> +  template  static auto m1 ()
> +-> decltype(this->a) { return 0; }; // { dg-error "'this'" }
> +};
> diff --git a/gcc/testsuite/g++.dg/template/pr97399a.C 
> b/gcc/testsuite/g++.dg/template/pr97399a.C
> new file mode 100644
> index 000..3713dbde6e0
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/pr97399a.C
> @@ -0,0 +1,11 @@
> +// PR c++/97399
> +// { dg-do compile { target c++11 } }
> +
> +template  struct enable_if_t {};
> +struct tmp {
> +  template  static constexpr bool is_integral();
> +  template  static auto func()
> +-> enable_if_t()>;
> +};
> +template  constexpr bool tmp::is_integral() { return true; }
> +int main() { tmp::func(); }
> diff --git a/gcc/testsuite/g++.dg/template/pr97399b.C 
> b/gcc/testsuite/g++.dg/template/pr97399b.C
> new file mode 100644
> index 000..9196c985834
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/pr97399b.C
> @@ -0,0 +1,11 @@
> +// PR c++/97399
> +// { dg-do compile { target c++11 } }
> +
> +template  struct enable_if_t {};
> +struct tmp {
> +  template  constexpr bool is_integral(); // non-static
> +  template  static auto func()
> +-> enable_if_t()>; // { dg-error "without object" }
> +};
> +template  constexpr bool tmp::is_integral() { return true; }
> +int main() { tmp::func(); } // { dg-error "no match" }
> -- 
> 2.30.0.155.g66

[PATCH v3] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Ilya Leoshkevich via Gcc-patches

On Thu, 2021-01-21 at 12:29 +, Richard Sandiford wrote:
> Given what you said in the other message about combine, I agree this
> is a reasonable workaround.  I don't know whether it's suitable for
> stage 4 or whether it would need to wait for stage 1.

Thanks for reviewing!  I've implemented your suggestions in the patch
below.

Regarding stage 4, this can be seen as a part of IBM Z

https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

regression fix - before moving long doubles to vector registers and
fixing up "f" constraints on RTL level, code generation for small
glibc functions like __ieee754_sqrtl has been fairly efficient.  Not
sure if that issue is big enough to justify this common code change at
this point, but still..



v2 -> v3: Added single_ebb_p, added paradoxical subreg check, fixed
formatting.  Bootstrapped and regtested on x86_64-redhat-linux,
pc64le-redhat-linux and s390x-redhat-linux.




Suppose we have:

(set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))

It is clearly profitable to propagate the first insn into the second
one and get:

(set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))

fwprop actually manages to perform this, but doesn't think the result is
worth it, which results in unnecessary store/load sequences on s390.
Improve the situation by classifying SUBREG -> MEM changes as
profitable.

gcc/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::classify_result): Allow
(subreg (mem)) simplifications.
---
 gcc/fwprop.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index eff8f7cc141..123cc228630 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -176,7 +176,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (rtx_insn *, rtx, rtx);
+fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -185,13 +185,20 @@ namespace
 bool check_mem (int, rtx) final override;
 void note_simplification (int, uint16_t, rtx, rtx) final override;
 uint16_t classify_result (rtx, rtx);
+
+  private:
+const bool single_use_p;
+const bool single_ebb_p;
   };
 }
 
 /* Prepare to replace FROM with TO in INSN.  */
 
-fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to)
-  : insn_propagation (insn, from, to)
+fwprop_propagation::fwprop_propagation (insn_info *use_insn,
+   insn_info *def_insn, rtx from, rtx to)
+  : insn_propagation (use_insn->rtl (), from, to),
+single_use_p (def_insn->num_uses () == 1),
+single_ebb_p (use_insn->ebb () == def_insn->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -262,6 +269,22 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
new_rtx)
   && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
 return PROFITABLE;
 
+  /* Allow (subreg (mem)) -> (mem) simplifications with the following
+ exceptions:
+ 1) Propagating (mem)s into multiple uses is not profitable.
+ 2) Propagating (mem)s across EBBs may not be profitable if the source EBB
+   runs less frequently.
+ 3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
+ 4) Creating new (mem/v)s is not correct, since DCE will not remove the old
+   ones.  */
+  if (single_use_p
+  && single_ebb_p
+  && SUBREG_P (old_rtx)
+  && !paradoxical_subreg_p (old_rtx)
+  && MEM_P (new_rtx)
+  && !MEM_VOLATILE_P (new_rtx))
+return PROFITABLE;
+
   return 0;
 }
 
@@ -363,7 +386,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_insn, def_insn, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -426,7 +449,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_insn, def_insn, dest, src);
   if (!prop.apply_to_pattern (loc))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
-- 
2.26.2

Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-01-21 Thread David Malcolm via Gcc-patches

On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> >  wrote:
> > > gimple.h has this comment for gimple's uid field:
> > > 
> > >   /* UID of this statement.  This is used by passes that want to
> > >  assign IDs to statements.  It must be assigned and used by
> > > each
> > >  pass.  By default it should be assumed to contain
> > > garbage.  */
> > >   unsigned uid;
> > > 
> > > and gimple_set_uid has:
> > > 
> > >Please note that this UID property is supposed to be undefined
> > > at
> > >pass boundaries.  This means that a given pass should not
> > > assume it
> > >contains any useful value when the pass starts and thus can
> > > set it
> > >to any value it sees fit.
> > > 
> > > which suggests that any pass can use the uid field as an
> > > arbitrary
> > > scratch space.
> > > 
> > > PR analyzer/98599 reports a case where this error occurs in LTO
> > > mode:
> > >   fatal error: Cgraph edge statement index out of range
> > > on certain inputs with -fanalyzer.
> > > 
> > > The error occurs in the LTRANS phase after -fanalyzer runs in the
> > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > stmts.
> > > 
> > > The error occurs when LTRANS is streaming callgraph edges back
> > > in.
> > > If I'm reading things correctly, the LTO format uses stmt uids to
> > > associate call stmts with callgraph edges between WPA and LTRANS.
> > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > gimple_uid, and input_edge reads it back in.
> > > 
> > > Hence IPA passes that touch the uids in WPA need to restore them,
> > > or the stream-in at LTRANS will fail.
> > > 
> > > Is it intended that the LTO machinery relies on the value of the
> > > uid
> > > field being preserved during WPA (or, at least, needs to be saved
> > > and
> > > restored by passes that touch it)?
> > 
> > I belive this is solely at the cgraph stream out to stream in
> > boundary but
> > this may be a blurred area since while we materialize the whole
> > cgraph
> > at once the function bodies are streamed in on demand.
> > 
> > Honza can probably clarify things.
> 
> Well, the uids are used to associate cgraph edges with
> statements.  At
> WPA stage you do not have function bodies and thus uids serves role
> of
> pointers to the statement.  If you load the body in (via get_body)
> the
> uids are replaced by pointers and when you stream out uids are
> recomputed again.
> 
> When do you touch the uids? At WPA time or from small IPA pass in
> ltrans?

The analyzer is here in passes.def:
  INSERT_PASSES_AFTER (all_regular_ipa_passes)
  NEXT_PASS (pass_analyzer);

and so in LTO runs as the first regular IPA pass at WPA time,
when do_whole_program_analysis calls:
  execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);

FWIW I hope to eventually have a way to summarize function bodies in
the analyzer, but I don't yet, so I'm currently brute forcing things by
loading all function bodies at the start of the analyzer (when
-fanalyzer is enabled).

I wonder if that's messing things up somehow?

Does the stream-out from WPA make any assumptions about the stmt uids?
For example, 
  #define STMT_UID_NOT_IN_RANGE(uid) \
(gimple_stmt_max_uid (fn) < uid || uid == 0)
seems to assume that the UIDs are per-function ranges from
  [0-gimple_stmt_max_uid (fn)]
which isn't the case for the uids set by the analyzer.  Maybe that's
the issue here?

Sorry for not being more familiar with the IPA/LTO code
Dave


> hozna
> > Note LTO uses this exactly because of this comment to avoid
> > allocating
> > extra memory for an 'index' but it could of course leave gimple_uid
> > alone
> > at some extra expense (eventually paid for in generic cgraph data
> > structures
> > and thus for not only the streaming time).
> > 
> > > On the assumption that this is the case, this patch updates the
> > > comments
> > > in gimple.h referring to passes being able to set uid to any
> > > value to
> > > note the caveat for IPA passes, and it updates the analyzer to
> > > save
> > > and restore the UIDs, fixing the error.
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > OK for master?
> > 
> > The analyzer bits are OK, let's see how Honza can clarify the
> > situation.
> > 
> > Thanks,
> > Richard.
> > 
> > > gcc/analyzer/ChangeLog:
> > > PR analyzer/98599
> > > * supergraph.cc (saved_uids::make_uid_unique): New.
> > > (saved_uids::restore_uids): New.
> > > (supergraph::supergraph): Replace assignments to stmt-
> > > >uid with
> > > calls to m_stmt_uids.make_uid_unique.
> > > (supergraph::~supergraph): New.
> > > * supergraph.h (class saved_uids): New.
> > > (supergraph::~supergraph): New decl.
> > > (supergraph::m_stmt_uids): New field.
> > > 
> > > gcc/ChangeLog:
> > > PR analyzer/98599
> > > * doc/gimple.texi: Document that UIDs must

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-21 Thread Rainer Orth

Hi Clement,

> Here is a new version of the patch. I've tested on Linux and AIX.
> There are still some tests failing but it starts having a good shape ! 
> However, I have few questions:
>
> 1) locale.name and syscalls

just a terminology nit: none of those are syscalls.

> 3) POSIX 2017 and non-POSIX functions
> Many of the *_l functions being used in GNU or dragonfly models aren't 
> POSIX 2008, but mainly POSIX 2017 or like strtof_l not POSIX at all. 
> However, there are really useful in the code, thus I've made a double 
> implementation based on "#ifdef HAVE_". Is it ok for you ? It's not really
> POSIX 2008 but more POSIX 2008 with 2017 compatibility. 
> For the configure, I didn't find any better way to check each syscall, as 
> they all depend on different includes. Tell me if you have a better idea.

First a general observation: there are two groups of functions you're
testing for:

* Pure BSD additions, not available in either POSIX.1, ISO C, or glibc:

  localeconv_l
  mbstowcs_l
  strtod_l
  strtof_l
  strtold_l
  wcsftime_l

* Part of XPG7:

  iswctype_l
  strcoll_l
  strftime_l
  strxfrm_l
  towlower_l
  towupper_l
  wcscoll_l
  wcsxfrm_l
  wctype_l

My suggestion would be not to have configure tests _GLIBCXX_HAVE_
for any of the second group at all: this is ieee_1003.1-2008, after all,
so if some OS selects that clocale variant, it better implement all of
those.  If really need be, one could a configure check for those and
error out if any is missing.  This makes the code way more readable than
trying to handle some hypothetical partial implementation.

As for the BSD group, I suggest to have one representative configure
test (for localeconv_l perhaps) and then use an appropriate name for the
group as a whole.  Again, this will most likely be an all-or-nothing
thing.

Besides, your configure tests are way too complicated: just use
AC_CHECK_FUNCS doing a link test and be done with it.

In a similar vein, configure.ac already has
AC_CHECK_HEADERS([xlocale.h]).  Rather than hardcoding the existance of
the header based on the configure triple, just use the existing
HAVE_XLOCALE_H.  This ways, things will simply fall into place for
e.g. NetBSD, OpenBSD and possibly others.

> 4) ctype_configure_char.cc 
> I've some troubles knowing what is supposed to be implemented on this file. 
> I don't really understand the part with setlocale which appears in many 
> os. When I'm adding it, some tests start failing, some start working... 
> Moreover, on Linux, if I understand correctly, there is some optimizations 
> based on classic_table(), _M_toupper and _M_tolower. Could you confirm 
> that it's only useful on Linux ?

I don't know myself.  However, when trying the first version of your
patch (augmented to compile on Solaris), the corresponding change to the
solaris file made no difference in test results.

> Feel free to try in on other OS. But I've made modifications only for AIX and 
> Linux, as I can test the other ones. 

While reading through the patch, I saw that in two places you still use
__DragonFly__ || __FreeBSD__ tests.  For one, it's hard to tell what
feature they are really about, besides they will require fiddling with
e.g. for other BSDs.  Please use a descriptive macro which says which
difference this is about.

That said, I gave the new patch a try on Solaris 11.4.  To get it to
compile, I had to apply two changes that I'd mentioned (without an actual
patch) when commenting on the first patch:

* The C99 fields of struct lconv need _LCONV_C99 to be visible for
  C++11.

* Some ctype macros need __bitmapsize = 15, as the generic clocale
  implementation uses.

With the attached patch, the code compiled using
--enable-clocale=ieee_1003.1-2008.

Compared to the augmented first patch, there are a few differences: a
couple of failures went away and I've now

+XPASS: 22_locale/ctype/is/wchar_t/2.cc execution test

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


diff --git a/libstdc++-v3/config/locale/ieee_1003.1-2008/ctype_members.cc b/libstdc++-v3/config/locale/ieee_1003.1-2008/ctype_members.cc
--- a/libstdc++-v3/config/locale/ieee_1003.1-2008/ctype_members.cc
+++ b/libstdc++-v3/config/locale/ieee_1003.1-2008/ctype_members.cc
@@ -196,7 +196,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   do_is(mask __m, char_type __c) const
   {
 bool __ret = false;
-const size_t __bitmasksize = 11;
+const size_t __bitmasksize = 15;
 #ifndef _GLIBCXX_HAVE_ISWCTYPE_L
 __c_locale __old = uselocale((locale_t)_M_c_locale_ctype);
 #endif
@@ -227,7 +227,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 #endif
 for (;__lo < __hi; ++__vec, ++__lo)
   {
-	const size_t __bitmasksize = 11;
+	const size_t __bitmasksize = 15;
 	mask __m = 0;
 	for (size_t __bitcur = 0; __bitcur <= __bitmasksize; ++__bitcur)
 #ifdef _GLIBCXX_HAVE_ISWCTYPE_L
@@ -345,7 +345,7 @@ namespace std _GLIBCXX_VIS

[PATCH] c++: Suppress this injection for static member functions [PR97399]

2021-01-21 Thread Patrick Palka via Gcc-patches

Here at parse time finish_qualified_id_expr adds an implicit 'this->' to
the expression tmp::integral (because it's type-dependent, and also
current_class_ptr is set) within the trailing return type, and later
during substitution we can't resolve the 'this' since
tsubst_function_type does inject_this_parm only for non-static member
functions which tmp::func is not.

It seems the root of the problem is that we set current_class_ptr when
parsing the signature of a static member function.  Since explicit uses
of 'this' are already not allowed in this context, we probably shouldn't
be doing inject_this_parm either.

Bootstrapped and regtested on x64_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

PR c++/97399
* parser.c (cp_parser_init_declarator): If the storage class
specifier is sc_static, pass true for static_p to
cp_parser_declarator.
(cp_parser_direct_declarator): Don't do inject_this_parm when
the member function is static.

gcc/testsuite/ChangeLog:

PR c++/88548
PR c++/97399
* g++.dg/cpp0x/this2.C: New test.
* g++.dg/template/pr97399a.C: New test.
* g++.dg/template/pr97399b.C: New test.
---
 gcc/cp/parser.c  |  5 +++--
 gcc/testsuite/g++.dg/cpp0x/this2.C   |  8 
 gcc/testsuite/g++.dg/template/pr97399a.C | 11 +++
 gcc/testsuite/g++.dg/template/pr97399b.C | 11 +++
 4 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/this2.C
 create mode 100644 gcc/testsuite/g++.dg/template/pr97399a.C
 create mode 100644 gcc/testsuite/g++.dg/template/pr97399b.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 48437f23175..18cf9888632 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -21413,6 +21413,7 @@ cp_parser_init_declarator (cp_parser* parser,
   bool is_non_constant_init;
   int ctor_dtor_or_conv_p;
   bool friend_p = cp_parser_friend_p (decl_specifiers);
+  bool static_p = decl_specifiers->storage_class == sc_static;
   tree pushed_scope = NULL_TREE;
   bool range_for_decl_p = false;
   bool saved_default_arg_ok_p = parser->default_arg_ok_p;
@@ -21446,7 +21447,7 @@ cp_parser_init_declarator (cp_parser* parser,
 = cp_parser_declarator (parser, CP_PARSER_DECLARATOR_NAMED,
flags, &ctor_dtor_or_conv_p,
/*parenthesized_p=*/NULL,
-   member_p, friend_p, /*static_p=*/false);
+   member_p, friend_p, static_p);
   /* Gather up the deferred checks.  */
   stop_deferring_access_checks ();
 
@@ -22122,7 +22123,7 @@ cp_parser_direct_declarator (cp_parser* parser,
 
  tree save_ccp = current_class_ptr;
  tree save_ccr = current_class_ref;
- if (memfn)
+ if (memfn && !static_p)
/* DR 1207: 'this' is in scope after the cv-quals.  */
inject_this_parameter (current_class_type, cv_quals);
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/this2.C 
b/gcc/testsuite/g++.dg/cpp0x/this2.C
new file mode 100644
index 000..3781bc5efec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/this2.C
@@ -0,0 +1,8 @@
+// PR c++/88548
+// { dg-do compile { target c++11 } }
+
+struct S {
+  int a;
+  template  static auto m1 ()
+-> decltype(this->a) { return 0; }; // { dg-error "'this'" }
+};
diff --git a/gcc/testsuite/g++.dg/template/pr97399a.C 
b/gcc/testsuite/g++.dg/template/pr97399a.C
new file mode 100644
index 000..3713dbde6e0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/pr97399a.C
@@ -0,0 +1,11 @@
+// PR c++/97399
+// { dg-do compile { target c++11 } }
+
+template  struct enable_if_t {};
+struct tmp {
+  template  static constexpr bool is_integral();
+  template  static auto func()
+-> enable_if_t()>;
+};
+template  constexpr bool tmp::is_integral() { return true; }
+int main() { tmp::func(); }
diff --git a/gcc/testsuite/g++.dg/template/pr97399b.C 
b/gcc/testsuite/g++.dg/template/pr97399b.C
new file mode 100644
index 000..9196c985834
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/pr97399b.C
@@ -0,0 +1,11 @@
+// PR c++/97399
+// { dg-do compile { target c++11 } }
+
+template  struct enable_if_t {};
+struct tmp {
+  template  constexpr bool is_integral(); // non-static
+  template  static auto func()
+-> enable_if_t()>; // { dg-error "without object" }
+};
+template  constexpr bool tmp::is_integral() { return true; }
+int main() { tmp::func(); } // { dg-error "no match" }
-- 
2.30.0.155.g66e871b664

Re: [PATCH v3] c++: ICE with delayed noexcept and attribute used [PR97966]

2021-01-21 Thread Marek Polacek via Gcc-patches

On Thu, Jan 21, 2021 at 01:55:24AM -0500, Jason Merrill wrote:
> > +  /* Now that we've gone through all the members, instantiate those
> > + marked with attribute used.  */
> > +  for (tree &x : used)
> 
> This doesn't need to be a reference.  And I think we want this to happen
> even later, after finish_struct_1.

Fair enough, here's an updated version.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Another ICE with delayed noexcept parsing, but a bit gnarlier.

A function definition marked with __attribute__((used)) ought to be
emitted even when it is not referenced in the TU.  For a member function
template marked with __attribute__((used)) this means that it will
be instantiated: in instantiate_class_template_1 we have

11971   /* Instantiate members marked with attribute used.  */
11972   if (r != error_mark_node && DECL_PRESERVE_P (r))
11973 mark_used (r);

It is not so surprising that this doesn't work well with delayed
noexcept parsing: when we're processing the function template we delay
the parsing, so the member "foo" is found, but then when we're
instantiating it, "foo" hasn't yet been seen, which creates a
discrepancy and a crash ensues.  "foo" hasn't yet been seen because
instantiate_class_template_1 just loops over the class members and
instantiates right away.

To make it work, this patch uses a vector to keep track of members
marked with attribute used and uses it to instantiate such members
only after we're done with the class; in particular, after we have
called finish_member_declaration for each member.  And we ought to
be verifying that we did emit such members, so I've added a bunch
of dg-finals.

gcc/cp/ChangeLog:

PR c++/97966
* pt.c (instantiate_class_template_1): Instantiate members
marked with attribute used only after we're done instantiating
the class.

gcc/testsuite/ChangeLog:

PR c++/97966
* g++.dg/cpp0x/noexcept63.C: New test.
---
 gcc/cp/pt.c | 12 -
 gcc/testsuite/g++.dg/cpp0x/noexcept63.C | 63 +
 2 files changed, 73 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept63.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 373f8279604..1f3850d1048 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11895,6 +11895,9 @@ instantiate_class_template_1 (tree type)
  relative to the scope of the class.  */
   pop_to_parent_deferring_access_checks ();
 
+  /* A vector to hold members marked with attribute used. */
+  auto_vec used;
+
   /* Now members are processed in the order of declaration.  */
   for (member = CLASSTYPE_DECL_LIST (pattern);
member; member = TREE_CHAIN (member))
@@ -11968,7 +11971,7 @@ instantiate_class_template_1 (tree type)
  finish_member_declaration (r);
  /* Instantiate members marked with attribute used.  */
  if (r != error_mark_node && DECL_PRESERVE_P (r))
-   mark_used (r);
+   used.safe_push (r);
  if (TREE_CODE (r) == FUNCTION_DECL
  && DECL_OMP_DECLARE_REDUCTION_P (r))
cp_check_omp_declare_reduction (r);
@@ -12034,7 +12037,7 @@ instantiate_class_template_1 (tree type)
 /*flags=*/0);
  /* Instantiate members marked with attribute used. */
  if (r != error_mark_node && DECL_PRESERVE_P (r))
-   mark_used (r);
+   used.safe_push (r);
}
  else if (TREE_CODE (r) == FIELD_DECL)
{
@@ -12203,6 +12206,11 @@ instantiate_class_template_1 (tree type)
   finish_struct_1 (type);
   TYPE_BEING_DEFINED (type) = 0;
 
+  /* Now that we've gone through all the members, instantiate those
+ marked with attribute used.  */
+  for (tree x : used)
+mark_used (x);
+
   /* We don't instantiate default arguments for member functions.  14.7.1:
 
  The implicit instantiation of a class template specialization causes
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept63.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept63.C
new file mode 100644
index 000..cf048f56c2a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept63.C
@@ -0,0 +1,63 @@
+// PR c++/97966
+// { dg-do compile { target c++11 } }
+
+template 
+struct S1 {
+  __attribute__((used)) S1() noexcept(noexcept(this->foo())) { }
+  void foo();
+};
+
+template 
+struct S2 {
+  __attribute__((used)) void bar() noexcept(noexcept(this->foo())) { }
+  void foo();
+};
+
+template 
+struct S3 {
+  void __attribute__((used)) bar() noexcept(noexcept(this->foo())) { }
+  void foo();
+};
+
+template 
+struct S4 {
+  [[gnu::used]] void bar() noexcept(noexcept(this->foo())) { }
+  void foo();
+};
+
+template 
+struct S5 {
+  void bar() noexcept(noexcept(this->foo())) __attribute__((used)) { }
+  void foo();
+};
+
+template 
+struct

Re: [PATCH] c++, v2: Fix up potential_constant_expression_1 FOR/WHILE_STMT handling [PR98672]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/19/21 4:54 PM, Jakub Jelinek wrote:

On Tue, Jan 19, 2021 at 04:01:49PM -0500, Jason Merrill wrote:

Hmm, IF_STMT probably also needs to check the else clause, if the condition
isn't a known constant.


You're right, I thought it was ok because it recurses with tf_none, but
if the then branch is potentially constant and only else returns, continues
or breaks, then as the enhanced testcase shows we were mishandling it too.

So like this then if it passes bootstrap/regtest?


OK.


2021-01-19  Jakub Jelinek  

PR c++/98672
* constexpr.c (check_for_return_continue_data): Add break_stmt member.
(check_for_return_continue): Also look for BREAK_STMT.  Handle 
SWITCH_STMT
by ignoring break_stmt from its body.
(potential_constant_expression_1) ,
: If the condition isn't constant true, check if
the loop body can contain a return stmt.
: Adjust check_for_return_continue_data initializer.
: If recursion with tf_none is successful, merge
*jump_target from the branches - returns with highest priority, breaks
or continues lower.  If then branch is potentially constant and
doesn't return, check the else branch if it could return, break or
continue.

* g++.dg/cpp1y/constexpr-98672.C: New test.

--- gcc/cp/constexpr.c.jj   2021-01-14 12:49:50.500644142 +0100
+++ gcc/cp/constexpr.c  2021-01-19 22:44:17.845322567 +0100
@@ -7649,15 +7649,16 @@ check_automatic_or_tls (tree ref)
  struct check_for_return_continue_data {
hash_set *pset;
tree continue_stmt;
+  tree break_stmt;
  };
  
  /* Helper function for potential_constant_expression_1 SWITCH_STMT handling,

 called through cp_walk_tree.  Return the first RETURN_EXPR found, or note
-   the first CONTINUE_STMT if RETURN_EXPR is not found.  */
+   the first CONTINUE_STMT and/or BREAK_STMT if RETURN_EXPR is not found.  */
  static tree
  check_for_return_continue (tree *tp, int *walk_subtrees, void *data)
  {
-  tree t = *tp, s;
+  tree t = *tp, s, b;
check_for_return_continue_data *d = (check_for_return_continue_data *) data;
switch (TREE_CODE (t))
  {
@@ -7669,6 +7670,11 @@ check_for_return_continue (tree *tp, int
d->continue_stmt = t;
break;
  
+case BREAK_STMT:

+  if (d->break_stmt == NULL_TREE)
+   d->break_stmt = t;
+  break;
+
  #define RECUR(x) \
if (tree r = cp_walk_tree (&x, check_for_return_continue, data, \
 d->pset))   \
@@ -7680,16 +7686,20 @@ check_for_return_continue (tree *tp, int
*walk_subtrees = 0;
RECUR (DO_COND (t));
s = d->continue_stmt;
+  b = d->break_stmt;
RECUR (DO_BODY (t));
d->continue_stmt = s;
+  d->break_stmt = b;
break;
  
  case WHILE_STMT:

*walk_subtrees = 0;
RECUR (WHILE_COND (t));
s = d->continue_stmt;
+  b = d->break_stmt;
RECUR (WHILE_BODY (t));
d->continue_stmt = s;
+  d->break_stmt = b;
break;
  
  case FOR_STMT:

@@ -7698,16 +7708,28 @@ check_for_return_continue (tree *tp, int
RECUR (FOR_COND (t));
RECUR (FOR_EXPR (t));
s = d->continue_stmt;
+  b = d->break_stmt;
RECUR (FOR_BODY (t));
d->continue_stmt = s;
+  d->break_stmt = b;
break;
  
  case RANGE_FOR_STMT:

*walk_subtrees = 0;
RECUR (RANGE_FOR_EXPR (t));
s = d->continue_stmt;
+  b = d->break_stmt;
RECUR (RANGE_FOR_BODY (t));
d->continue_stmt = s;
+  d->break_stmt = b;
+  break;
+
+case SWITCH_STMT:
+  *walk_subtrees = 0;
+  RECUR (SWITCH_STMT_COND (t));
+  b = d->break_stmt;
+  RECUR (SWITCH_STMT_BODY (t));
+  d->break_stmt = b;
break;
  #undef RECUR
  
@@ -8190,7 +8212,18 @@ potential_constant_expression_1 (tree t,

  /* If we couldn't evaluate the condition, it might not ever be
 true.  */
  if (!integer_onep (tmp))
-   return true;
+   {
+ /* Before returning true, check if the for body can contain
+a return.  */
+ hash_set pset;
+ check_for_return_continue_data data = { &pset, NULL_TREE,
+ NULL_TREE };
+ if (tree ret_expr
+ = cp_walk_tree (&FOR_BODY (t), check_for_return_continue,
+ &data, &pset))
+   *jump_target = ret_expr;
+ return true;
+   }
}
if (!RECUR (FOR_EXPR (t), any))
return false;
@@ -8219,7 +8252,18 @@ potential_constant_expression_1 (tree t,
tmp = cxx_eval_outermost_constant_expr (tmp, true);
/* If we couldn't evaluate the condition, it might not ever be true.  */
if (!integer_onep (tmp))
-   return true;
+   {
+ /* Before returning true, check if the while body ca

Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2021-01-21 Thread Jan Hubicka

> Hi All,
> 
> James and I have been investigating this regression and have tracked it down 
> to register allocation.
> 
> I have create a new PR with our findings 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 but unfortunately
> we don't know how to proceed.
> 
> This does seem like a genuine bug in RA.  It looks like some magic threshold 
> has been crossed, but we're having
> trouble determining what this magic number is.

Thank you for the analysis - it was on my TODO list for very long
time, but the function is large.  I will read it carefully and lets see
if we can come up with something useful.  

Honza
> 
> Any help is appreciated.
> 
> Thanks,
> Tamar
> 
> > -Original Message-
> > From: Xionghu Luo 
> > Sent: Friday, October 16, 2020 9:47 AM
> > To: Tamar Christina ; Martin Jambor
> > ; Richard Sandiford ;
> > luoxhu via Gcc-patches 
> > Cc: seg...@kernel.crashing.org; wschm...@linux.ibm.com;
> > li...@gcc.gnu.org; Jan Hubicka ; dje@gmail.com
> > Subject: Re: [PATCH] ipa-inline: Improve growth accumulation for recursive
> > calls
> > 
> > 
> > 
> > On 2020/9/12 01:36, Tamar Christina wrote:
> > > Hi Martin,
> > >
> > >>
> > >> can you please confirm that the difference between these two is all
> > >> due to the last option -fno-inline-functions-called-once ?  Is LTo
> > >> necessary?  I.e., can you run the benchmark also built with the
> > >> branch compiler and -mcpu=native -Ofast -fomit-frame-pointer -fno-
> > inline-functions-called-once ?
> > >>
> > >
> > > Done, see below.
> > >
> > >>> +--+
> > >>> +--+--
> > >> +--+--+-
> > -+
> > >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> > >> | -24% |  |  |
> > >>> +--+
> > >>> +--+--
> > >> +--+--+-
> > -+
> > >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> > >> | -26% |  |  |
> > >>> +--+
> > >>> +--+--
> > >> +--+--+-
> > -+
> > >>
> > >>>
> > >>> (Hopefully the table shows up correct)
> > >>
> > >> it does show OK for me, thanks.
> > >>
> > >>>
> > >>> It looks like your patch definitely does improve the basic cases. So
> > >>> there's not much difference between lto and non-lto anymore and it's
> > >> much Better than GCC 10. However it still contains the regression
> > >> introduced by Honza's changes.
> > >>
> > >> I assume these are rates, not times, so negative means bad.  But do I
> > >> understand it correctly that you're comparing against GCC 10 with the
> > >> two parameters set to rather special values?  Because your table
> > >> seems to indicate that even for you, the branch is faster than GCC 10
> > >> with just - mcpu=native -Ofast -fomit-frame-pointer.
> > >
> > > Yes these are indeed rates, and indeed I am comparing against the same
> > > options we used to get the fastest rates on before which is the two
> > > parameters and the inline flag.
> > >
> > >>
> > >> So is the problem that the best obtainable run-time, even with
> > >> obscure options, from the branch is slower than the best obtainable
> > >> run-time from GCC 10?
> > >>
> > >
> > > Yeah that's the problem, when we compare the two we're still behind.
> > >
> > > I've done the additional two runs
> > >
> > > +--+--
> > +--+
> > > | Compiler | Flags
> > | diff GCC 10  |
> > > +--+--
> > +--+
> > > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> > > ipa-cp-
> > eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> > called-once |  |
> > > +--+--
> > +--+
> > > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer
> > | -44% |
> > > +--+--
> > +--+
> > > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> > | -36% |
> > > +--+--
> >

Re: [PATCH] c++: Fix excessive instantiation inside decltype [PR71879]

2021-01-21 Thread Jason Merrill via Gcc-patches


On 1/20/21 11:27 AM, Patrick Palka wrote:

On Tue, 19 Jan 2021, Jason Merrill wrote:


On 1/18/21 12:31 AM, Patrick Palka wrote:

Here after resolving the address of a template-id inside decltype, we
end up instantiating the chosen specialization from the call to
mark_used in resolve_nondeduced_context, even though only its type is
needed.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look for
trunk?

gcc/cp/ChangeLog:

PR c++/71879
* semantics.c (finish_decltype_type): Temporarily increment
cp_unevaluated_operand during call to resolve_nondeduced_context.

gcc/testsuite/ChangeLog:

PR c++/71879
* g++.dg/cpp0x/decltype-71879.C: New test.
---
   gcc/cp/semantics.c  | 2 ++
   gcc/testsuite/g++.dg/cpp0x/decltype-71879.C | 5 +
   2 files changed, 7 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-71879.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index c8a6283b120..cad55665ce8 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -10098,7 +10098,9 @@ finish_decltype_type (tree expr, bool
id_expression_or_member_access_p,
   /* The type denoted by decltype(e) is defined as follows:  */
   +  ++cp_unevaluated_operand;
 expr = resolve_nondeduced_context (expr, complain);
+  --cp_unevaluated_operand;


Hmm, is there a reason not to have cp_unevaluated_operand set through the
whole function?  We might stick 'cp_unevaluated u;' at the top of the function
and remove the existing messing with cp_unevaluated_operand.  Or assert that
it's set and fix the callers to leave it set longer.


That makes sense to me.  AFAICT the only parts of finish_decltype_type
tha are sensitive to cp_unevaluated_operand are
resolve_nondeduced_context and instantiate_nondependent_expr_sfinae,
but it makes sense to have cp_unevaluated_operand set throughout the
function.

The following passes limited testing, does it look OK for trunk after a
full bootstrap+regtest?


OK.


-- >8 --

Subject: [PATCH] c++: Fix excessive instantiation inside decltype [PR71879]

Here after resolving the address of a template-id inside decltype, we
end up instantiating the chosen specialization (from the call to
mark_used in resolve_nondeduced_context), even though only its type is
needed.

This patch sets cp_unevaluated_operand throughout finish_decltype_type,
so that in particular it's set during the call to
resolve_nondeduced_context within.

gcc/cp/ChangeLog:

PR c++/71879
* semantics.c (finish_decltype_type): Set up a cp_unevaluated
sentinel at the start of the function.  Remove a now-redundant
manual adjustment of cp_unevaluated_operand.

gcc/testsuite/ChangeLog:

PR c++/71879
* g++.dg/cpp0x/decltype-71879.C: New test.
---
  gcc/cp/semantics.c  | 5 +++--
  gcc/testsuite/g++.dg/cpp0x/decltype-71879.C | 5 +
  2 files changed, 8 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-71879.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 244fc70d02d..834885616db 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -10080,6 +10080,9 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
return error_mark_node;
  }
  
+  /* The operand of decltype is an unevaluated expression.  */

+  cp_unevaluated u;
+
/* Depending on the resolution of DR 1172, we may later need to distinguish
   instantiation-dependent but not type-dependent expressions so that, say,
   A::U doesn't require 'typename'.  */
@@ -10095,9 +10098,7 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
  }
else if (processing_template_decl)
  {
-  ++cp_unevaluated_operand;
expr = instantiate_non_dependent_expr_sfinae (expr, complain);
-  --cp_unevaluated_operand;
if (expr == error_mark_node)
return error_mark_node;
  }
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype-71879.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype-71879.C
new file mode 100644
index 000..9da4d40ca70
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype-71879.C
@@ -0,0 +1,5 @@
+// PR c++/71879
+// { dg-do compile { target c++11 } }
+
+template  void f(T x) { x.fail(); }
+using R = decltype(&f);

RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2021-01-21 Thread Tamar Christina via Gcc-patches

Hi All,

James and I have been investigating this regression and have tracked it down to 
register allocation.

I have create a new PR with our findings 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 but unfortunately
we don't know how to proceed.

This does seem like a genuine bug in RA.  It looks like some magic threshold 
has been crossed, but we're having
trouble determining what this magic number is.

Any help is appreciated.

Thanks,
Tamar

> -Original Message-
> From: Xionghu Luo 
> Sent: Friday, October 16, 2020 9:47 AM
> To: Tamar Christina ; Martin Jambor
> ; Richard Sandiford ;
> luoxhu via Gcc-patches 
> Cc: seg...@kernel.crashing.org; wschm...@linux.ibm.com;
> li...@gcc.gnu.org; Jan Hubicka ; dje@gmail.com
> Subject: Re: [PATCH] ipa-inline: Improve growth accumulation for recursive
> calls
> 
> 
> 
> On 2020/9/12 01:36, Tamar Christina wrote:
> > Hi Martin,
> >
> >>
> >> can you please confirm that the difference between these two is all
> >> due to the last option -fno-inline-functions-called-once ?  Is LTo
> >> necessary?  I.e., can you run the benchmark also built with the
> >> branch compiler and -mcpu=native -Ofast -fomit-frame-pointer -fno-
> inline-functions-called-once ?
> >>
> >
> > Done, see below.
> >
> >>> +--+
> >>> +--+--
> >> +--+--+-
> -+
> >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> >> | -24% |  |  |
> >>> +--+
> >>> +--+--
> >> +--+--+-
> -+
> >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> >> | -26% |  |  |
> >>> +--+
> >>> +--+--
> >> +--+--+-
> -+
> >>
> >>>
> >>> (Hopefully the table shows up correct)
> >>
> >> it does show OK for me, thanks.
> >>
> >>>
> >>> It looks like your patch definitely does improve the basic cases. So
> >>> there's not much difference between lto and non-lto anymore and it's
> >> much Better than GCC 10. However it still contains the regression
> >> introduced by Honza's changes.
> >>
> >> I assume these are rates, not times, so negative means bad.  But do I
> >> understand it correctly that you're comparing against GCC 10 with the
> >> two parameters set to rather special values?  Because your table
> >> seems to indicate that even for you, the branch is faster than GCC 10
> >> with just - mcpu=native -Ofast -fomit-frame-pointer.
> >
> > Yes these are indeed rates, and indeed I am comparing against the same
> > options we used to get the fastest rates on before which is the two
> > parameters and the inline flag.
> >
> >>
> >> So is the problem that the best obtainable run-time, even with
> >> obscure options, from the branch is slower than the best obtainable
> >> run-time from GCC 10?
> >>
> >
> > Yeah that's the problem, when we compare the two we're still behind.
> >
> > I've done the additional two runs
> >
> > +--+--
> +--+
> > | Compiler | Flags
> | diff GCC 10  |
> > +--+--
> +--+
> > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-
> eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> called-once |  |
> > +--+--
> +--+
> > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer
> | -44% |
> > +--+--
> +--+
> > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> | -36% |
> > +--+--
> +--+
> > | GCC 11   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-
> eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> called-once | -12% |
> > +--+--
> +--+
> > | Branch   | -mcpu=native

Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Joel Hutton via Gcc-patches

> I think we should try to fix the PR instead.  The widening operations
> can only be used for SLP if the group_size is at least double the
> number of elements in the vectype, so one idea (not worked through)
> is to make the vect_build_slp_tree family of routines undo pattern
> recognition for widening operations if the group size is less than that. >

This seems reasonable, how do I go about 'undoing' the pattern recognition.

Ideally the patterns wouldn't be substituted in the first place, but group size 
is chosen after pattern substitution.

From: Richard Sandiford 
Sent: 21 January 2021 13:40
To: Richard Biener 
Cc: Joel Hutton via Gcc-patches ; Joel Hutton 

Subject: Re: [AArch64] Remove backend support for widen-sub

Richard Biener  writes:
> On Thu, 21 Jan 2021, Richard Sandiford wrote:
>
>> Joel Hutton via Gcc-patches  writes:
>> > Hi all,
>> >
>> > This patch removes support for the widening subtract operation in the 
>> > aarch64 backend as it is causing a performance regression.
>> >
>> > In the following example:
>> >
>> > #include 
>> > extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t 
>> > *restrict pix2)
>> > {
>> >for( int y = 0; y < 4; y++ )
>> >   {
>> > for( int x = 0; x < 4; x++ )
>> >   d[x + y*4] = pix1[x] - pix2[x];
>> > pix1 += 16;
>> > pix2 += 16;
>> >  }
>> >
>> > The widening minus pattern is recognized and substituted, but cannot be 
>> > used due to the input vector type chosen in slp vectorization. This 
>> > results in an attempt to do an 8 byte->8 short widening subtract 
>> > operation, which is not supported.
>> >
>> > The issue is documented in PR 98772.
>>
>> IMO removing just the sub patterns is too arbitrary.  Like you say,
>> the PR affects all widening instructions. but it happens that the
>> first encountered regression used subtraction.
>>
>> I think we should try to fix the PR instead.  The widening operations
>> can only be used for SLP if the group_size is at least double the
>> number of elements in the vectype, so one idea (not worked through)
>> is to make the vect_build_slp_tree family of routines undo pattern
>> recognition for widening operations if the group size is less than that.
>>
>> Richi would know better than me though.
>
> Why should the widen ops be only usable with such constraints?
> As I read md.texi they for example do v8qi -> v8hi operations
> (where the v8qi is either lo or hi part of a v16qi vector).

They're v16qi->v8hi operations rather than v8qi->v8hi.  The lo
operations operate on one half of the v16qi and the hi operations
operate on the other half.  They're supposed to be used together
to produce v16qi->v8hi+v8hi, so for BB SLP we need a group size
of 16 to feed them.  (Loop-aware SLP is fine as-is because we can
just increase the unroll factor.)

In the testcase, the store end of the SLP graph is operating on
8 shorts, which further up the graph are converted from 8 chars.
To use WIDEN_MINUS_EXPR at v8hi we'd need 16 shorts and 16 chars.

> The dumps show we use a VF of 4 which makes us have two v8hi
> vectors and one v16qi which vectorizable_conversion should
> be able to handle by emitting hi/lo widened subtracts.

AIUI the problem is with slp1.  I think the loop side is fine.

Thanks,
Richard

[PATCH] aarch64: Use canonical RTL for sqdmlal patterns

2021-01-21 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

The aarch64_sqdmll patterns are of the form:
  [(set (match_operand: 0 "register_operand" "=w")
(SBINQOPS:
  (match_operand: 1 "register_operand" "0")
  (ss_ashift:
  (mult:
(sign_extend:
  (match_operand:VSD_HSI 2 "register_operand" "w"))
(sign_extend:
  (match_operand:VSD_HSI 3 "register_operand" "w")))
  (const_int 1]

where SBINQOPS is ss_plus and ss_minus. The problem is that for the ss_plus 
case the RTL
is not canonical: the (match_oprand 1) should be the second arm of the PLUS.
I've seen this manifest in combine missing some legitimate simplifications 
because it generates
the canonical ss_plus form and fails to match the pattern.

This patch splits the patterns into the ss_plus and ss_minus forms with the 
canonical form for each.
I've seen this improve my testcase (which I can't include as it's too large and 
not easy to test reliably).

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_sqdmll):
Split into...
(aarch64_sqdmlal): ... This...
(aarch64_sqdmlsl): ... And this.
(aarch64_sqdmll_lane): Split into...
(aarch64_sqdmlal_lane): ... This...
(aarch64_sqdmlsl_lane): ... And this.
(aarch64_sqdmll_laneq): Split into...
(aarch64_sqdmlsl_laneq): ... This...
(aarch64_sqdmlal_laneq):  ... And this.
(aarch64_sqdmll_n): Split into...
(aarch64_sqdmlsl_n): ... This...
(aarch64_sqdmlal_n): ... And this.
(aarch64_sqdmll2_internal): Split into...
(aarch64_sqdmlal2_internal): ... This...
(aarch64_sqdmlsl2_internal): ... And this.


sqdmlal.patch
Description: sqdmlal.patch

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka

> 
> Plus I'm planning to send one more patch that will ignore time profile when 
> -fprofile-reproduce != serial.

Why you need to disable time profiling?

Honza

[committed] d: Enable private member access for __traits

2021-01-21 Thread Iain Buclaw via Gcc-patches

Hi,

This patch merges the D front-end with upstream dmd 3a7ebef73.

The following traits can now access non-public members:
 - hasMember
 - getMember
 - getOverloads
 - getVirtualMethods
 - getVirtualFuntions

This fixes a long-standing issue in D where the allMembers trait would
correctly return non-public members but those non-public members would
be inaccessible to other traits.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 3a7ebef73.
---
 gcc/d/dmd/MERGE   |  2 +-
 gcc/d/dmd/traits.c| 10 --
 gcc/testsuite/gdc.test/compilable/imports/test15371.d |  9 +
 gcc/testsuite/gdc.test/compilable/test15371.d | 10 ++
 4 files changed, 24 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gdc.test/compilable/imports/test15371.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/test15371.d

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index 4f7f7a8ff3b..1f907b8f19f 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-2d3d137489f030395d06cb664087fd1a35bccabe
+3a7ebef73cc01d4a877a95cf95cd3776c9e3ee66
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/traits.c b/gcc/d/dmd/traits.c
index 5fd4b486a9b..70f7f2cb582 100644
--- a/gcc/d/dmd/traits.c
+++ b/gcc/d/dmd/traits.c
@@ -1103,12 +1103,14 @@ Expression *semanticTraits(TraitsExp *e, Scope *sc)
 return new ErrorExp();
 }
 
+// ignore symbol visibility and disable access checks for these traits
+Scope *scx = sc->push();
+scx->flags |= SCOPEignoresymbolvisibility | SCOPEnoaccesscheck;
+
 if (e->ident == Id::hasMember)
 {
 /* Take any errors as meaning it wasn't found
  */
-Scope *scx = sc->push();
-scx->flags |= SCOPEignoresymbolvisibility;
 ex = trySemantic(ex, scx);
 scx->pop();
 return ex ? True(e) : False(e);
@@ -1118,8 +1120,6 @@ Expression *semanticTraits(TraitsExp *e, Scope *sc)
 if (ex->op == TOKdotid)
 // Prevent semantic() from replacing Symbol with its 
initializer
 ((DotIdExp *)ex)->wantsym = true;
-Scope *scx = sc->push();
-scx->flags |= SCOPEignoresymbolvisibility;
 ex = semantic(ex, scx);
 scx->pop();
 return ex;
@@ -1130,8 +1130,6 @@ Expression *semanticTraits(TraitsExp *e, Scope *sc)
 {
 unsigned errors = global.errors;
 Expression *eorig = ex;
-Scope *scx = sc->push();
-scx->flags |= SCOPEignoresymbolvisibility;
 ex = semantic(ex, scx);
 if (errors < global.errors)
 e->error("%s cannot be resolved", eorig->toChars());
diff --git a/gcc/testsuite/gdc.test/compilable/imports/test15371.d 
b/gcc/testsuite/gdc.test/compilable/imports/test15371.d
new file mode 100644
index 000..49b446a329b
--- /dev/null
+++ b/gcc/testsuite/gdc.test/compilable/imports/test15371.d
@@ -0,0 +1,9 @@
+module imports.test15371;
+
+struct A
+{
+private int a;
+private void fun() {}
+private void fun(int, int) {}
+public void fun(int) {}
+}
diff --git a/gcc/testsuite/gdc.test/compilable/test15371.d 
b/gcc/testsuite/gdc.test/compilable/test15371.d
new file mode 100644
index 000..6e762beeb1e
--- /dev/null
+++ b/gcc/testsuite/gdc.test/compilable/test15371.d
@@ -0,0 +1,10 @@
+// EXTRA_FILES: imports/test15371.d
+import imports.test15371;
+
+void main()
+{
+A a;
+static assert(__traits(hasMember, A, "a"));
+static assert(__traits(getOverloads, A, "fun").length == 3);
+static assert(__traits(compiles, __traits(getMember, a, "a") ));
+}
-- 
2.27.0

Re: [PATCH] Fix typo in arm_mve.h __arm_vcmpneq_s8 return type

2021-01-21 Thread Christophe Lyon via Gcc-patches

On Thu, 21 Jan 2021 at 14:39, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 21 January 2021 13:37
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH] Fix typo in arm_mve.h __arm_vcmpneq_s8 return type
> >
> > Like all vcmp intrinsics, __arm_vcmpneq_s8 should return a mve_pred16_t.
> >
>
> Ok. I think it also needs a backport to GCC 10
> Thanks,
> Kyrill
>

Indeed.
Pushed to trunk and gcc-10.

Thanks

Christophe

> > 2021-01-21  Christophe Lyon  
> >
> >   gcc/
> >   * config/arm/arm_mve.h (__arm_vcmpneq_s8): Fix return type.
> > ---
> >  gcc/config/arm/arm_mve.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> > index f27f6cd..3a40c6e 100644
> > --- a/gcc/config/arm/arm_mve.h
> > +++ b/gcc/config/arm/arm_mve.h
> > @@ -3670,7 +3670,7 @@ __arm_vaddlvq_p_u32 (uint32x4_t __a,
> > mve_pred16_t __p)
> >return __builtin_mve_vaddlvq_p_uv4si (__a, __p);
> >  }
> >
> > -__extension__ extern __inline int32_t
> > +__extension__ extern __inline mve_pred16_t
> >  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> >  __arm_vcmpneq_s8 (int8x16_t __a, int8x16_t __b)
> >  {
> > --
> > 2.7.4
>

Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2021-01-21 Thread Jakub Jelinek via Gcc-patches

On Wed, Jan 06, 2021 at 11:34:32AM +0800, Hongtao Liu via Gcc-patches wrote:
> > >>
> > >> Note there's a data dependency between them.  insn 7 feeds insn 9.  When
> > >> there's a data dependency, combiner patterns are usually the better
> > >> choice than peepholes.  I think you'd be looking to match something
> > >> likethis (from the . combine dump):
> > >>
> 
> Using combiner patterns, details is discussed in PR98348
> 
> Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk.
> gcc/ChangeLog:
> 
> PR target/96891
> PR target/98348
> * config/i386/sse.md (VI_128_256): New mode iterator.
> (*avx_cmp3_1, *avx_cmp3_2, *avx_cmp3_3,
>  *avx_cmp3_4, *avx2_eq3, *avx2_pcmp3_1,
>  *avx2_pcmp3_2, *avx2_gt3): New
> define_insn_and_split to lower avx512 vector comparison to avx
> version when dest is vector.
> (*_cmp3,*_cmp3,*_ucmp3):
> define_insn_and_split for negating the comparison result.
> * config/i386/predicates.md (float_vector_all_ones_operand):
> New predicate.
> * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use
> general NOT operator without UNSPEC_MASKOP.
> 
> gcc/testsuite/ChangeLog:
> 
> PR target/96891
> PR target/98348
> * gcc.target/i386/avx512bw-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-2.c: New test.
> * gcc.target/i386/avx512f-pr96891-3.c: New test.
> * g++.target/i386/avx512f-pr96891-1.C: New test.
> * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase.

Ok for trunk.  I'd prefer not to backport it to GCC 10.

Jakub

Re: [PATCH] Drop profile reproducibility flag as it is not used.

2021-01-21 Thread Jan Hubicka

> On 1/21/21 9:25 AM, Richard Biener wrote:
> > On Wed, Jan 20, 2021 at 5:25 PM Martin Liška  wrote:
> > > 
> > > On 1/20/21 5:00 PM, Jan Hubicka wrote:
> > > > There are two thinks that I would like to discuss first
> > > >1) I think the option is stil used for value profiling of divisors
> > > 
> > > It's not. Right now the only usage lives in get_nth_most_common_value 
> > > which
> > > is an entry point being used for stringops, indirect call and divmod 
> > > histograms.
> > > 
> > > >2) it is not clear to me how the new counter establishes
> > > >reproducibility for indiect calls that have more than 32 targets 
> > > > during
> > > >the train run.
> > > 
> > > We cannot ensure, but I would say that it's very unlikely to happen.
> > > In case of Firefox, there are definitely other reasons why the build is 
> > > not reproducible.
> > > I would expect arc counters to differ in between multiple training runs.
> > > 
> > > If it's really a problem we can come up with other approaches:
> > > - GCOV run-time control over # of tracked values (32 right now)
> > > - similarly we can save more values in .gcda files
> > > 
> > > I'm sending updated version of the patch.
> > 
> > So the discussion tells me that we want the option and of course want to 
> > have
> > it work.
> 
> Yep, I've just sent patch for that.
> 
> > In the patch I see the =multithreaded enum part was not documented
> > (I don't see how it differs from =parallel-runs?), so I wonder if we really 
> > need
> > three states.
> 
> It's a reserved option value Honza though will be useful for the future (:

Not exactly - I intended it to work already in gcc10 as follows.

 - With =serial one can trust all counters coming from gcda file
   (I looked again in details to gcc10 implementation and I think the
   handling of sign bit is correct, contrary to my previous claim)
 - With =paralel-runs we can use the sign bit trick to signalize that
   merging was lossy
 - With =multithreaded we can only use the counter if sum of individual
   targets matches the total number of executions (so we know no target
   got lost).

Basically =serial means that you get reproducible profile only if the
events (profile counter invocation, profile streaming) come in precisely
same order in both train runs. (Such as profiledbootstrap running with
make -j1)

With =parallel-run you get reproducible profile under the assumption
that train run consist of multiple invocations (or does forking), each
invocation is reproducible but streaming happens in random order
(profiledbootstrap with make -j16).

With =multithreaded you get reproducible profile if the profile
counter invocations match in both train runs, but they can happen in any
order (profiledbootstrap with make -j16  once we make gcc multithreaded,
or build of clang with FDO).

> 
> > 
> > That said, the option handling is indeed broken at the moment.  While
> > the implementation is not perfect, does it have some pieces that help?
> > 
> > That said, why not simply fix option handling by adding the missing =
> > to the option?  Using -fprofile-reproducibleserial etc. work but before
> > adding backwards compatibility aliases I'd say we change w/o them
> > for GCC 11 and only if there are complaints introduce them (and eventually
> > backport the option handling fix to GCC 10).
> 
> Yes, I would like to backport the option fix for GCC 10. But apparently, 
> there's
> nobody using the option.

I think easy way to get users of this option is to make profile not
reproducible by default and modify packages to use right reproducibility
option when reproducible builds are intended.  It is not feature that
comes for free and I think most users of PGO does not care, so I think
it should be opt in.

In general getting profile reroducible one needs to make train
reproducible that is hard when you look at details (such as /tmp/ file
name generation issue in gcc) and may lead to need for user to annotate
such code.

This will become more common problem for multithreaded profiles where
one needs to annotate locking and busy waiting loops in them for example
(or the scheduler responsible for executing paralle tasks).

I can see this to be practically achievable but we probably want to
produce some guidelines for doing that and probably teach gcov-tool to
compare profiles and say to which degree they match (i.e. which
functions match for each of levels of reproducibility).

The problem is that profiles are continuous and the errors too, but
optimizaitons looks for certain thresholds, so small errors may lead to
code changes, so I think our current method of looking at relatively few
packages and patching errors when they appear is not very good long term
strategy... Especially if it makes us to drop useful transformations by
default with -fprofile-use and no additional option.

Honza
> 
> Martin
> 
> > 
> > Richard.
> > 
> > > Martin
>

Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Thu, 21 Jan 2021, Richard Sandiford wrote:
>
>> Joel Hutton via Gcc-patches  writes:
>> > Hi all, 
>> >
>> > This patch removes support for the widening subtract operation in the 
>> > aarch64 backend as it is causing a performance regression.
>> >
>> > In the following example:
>> >
>> > #include 
>> > extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t 
>> > *restrict pix2)
>> > {
>> >    for( int y = 0; y < 4; y++ )
>> >   {    
>> >     for( int x = 0; x < 4; x++ )
>> >       d[x + y*4] = pix1[x] - pix2[x];
>> >     pix1 += 16;  
>> >     pix2 += 16;
>> >  }
>> >
>> > The widening minus pattern is recognized and substituted, but cannot be 
>> > used due to the input vector type chosen in slp vectorization. This 
>> > results in an attempt to do an 8 byte->8 short widening subtract 
>> > operation, which is not supported. 
>> >
>> > The issue is documented in PR 98772.
>> 
>> IMO removing just the sub patterns is too arbitrary.  Like you say,
>> the PR affects all widening instructions. but it happens that the
>> first encountered regression used subtraction.
>> 
>> I think we should try to fix the PR instead.  The widening operations
>> can only be used for SLP if the group_size is at least double the
>> number of elements in the vectype, so one idea (not worked through)
>> is to make the vect_build_slp_tree family of routines undo pattern
>> recognition for widening operations if the group size is less than that.
>> 
>> Richi would know better than me though.
>
> Why should the widen ops be only usable with such constraints?
> As I read md.texi they for example do v8qi -> v8hi operations
> (where the v8qi is either lo or hi part of a v16qi vector).

They're v16qi->v8hi operations rather than v8qi->v8hi.  The lo
operations operate on one half of the v16qi and the hi operations
operate on the other half.  They're supposed to be used together
to produce v16qi->v8hi+v8hi, so for BB SLP we need a group size
of 16 to feed them.  (Loop-aware SLP is fine as-is because we can
just increase the unroll factor.)

In the testcase, the store end of the SLP graph is operating on
8 shorts, which further up the graph are converted from 8 chars.
To use WIDEN_MINUS_EXPR at v8hi we'd need 16 shorts and 16 chars.

> The dumps show we use a VF of 4 which makes us have two v8hi
> vectors and one v16qi which vectorizable_conversion should
> be able to handle by emitting hi/lo widened subtracts.

AIUI the problem is with slp1.  I think the loop side is fine.

Thanks,
Richard

RE: [PATCH] Fix typo in arm_mve.h __arm_vcmpneq_s8 return type

2021-01-21 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 21 January 2021 13:37
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] Fix typo in arm_mve.h __arm_vcmpneq_s8 return type
> 
> Like all vcmp intrinsics, __arm_vcmpneq_s8 should return a mve_pred16_t.
> 

Ok. I think it also needs a backport to GCC 10
Thanks,
Kyrill

> 2021-01-21  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm_mve.h (__arm_vcmpneq_s8): Fix return type.
> ---
>  gcc/config/arm/arm_mve.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index f27f6cd..3a40c6e 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -3670,7 +3670,7 @@ __arm_vaddlvq_p_u32 (uint32x4_t __a,
> mve_pred16_t __p)
>return __builtin_mve_vaddlvq_p_uv4si (__a, __p);
>  }
> 
> -__extension__ extern __inline int32_t
> +__extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_s8 (int8x16_t __a, int8x16_t __b)
>  {
> --
> 2.7.4

Re: [PATCH] arm: [testuiste] fix ivopts.c target test [PR96372]

2021-01-21 Thread Andrea Corallo via Gcc-patches

Kyrylo Tkachov  writes:

[...]

> Ah ok, then it's fine to be consistent.
> The patch is ok.
> Thanks,
> Kyrill

Thanks, into master as 0568f801eff

  Andrea

[PATCH] Fix typo in arm_mve.h __arm_vcmpneq_s8 return type

2021-01-21 Thread Christophe Lyon via Gcc-patches

Like all vcmp intrinsics, __arm_vcmpneq_s8 should return a mve_pred16_t.

2021-01-21  Christophe Lyon  

gcc/
* config/arm/arm_mve.h (__arm_vcmpneq_s8): Fix return type.
---
 gcc/config/arm/arm_mve.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index f27f6cd..3a40c6e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -3670,7 +3670,7 @@ __arm_vaddlvq_p_u32 (uint32x4_t __a, mve_pred16_t __p)
   return __builtin_mve_vaddlvq_p_uv4si (__a, __p);
 }
 
-__extension__ extern __inline int32_t
+__extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq_s8 (int8x16_t __a, int8x16_t __b)
 {
-- 
2.7.4

Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Richard Biener

On Thu, 21 Jan 2021, Richard Sandiford wrote:

> Joel Hutton via Gcc-patches  writes:
> > Hi all, 
> >
> > This patch removes support for the widening subtract operation in the 
> > aarch64 backend as it is causing a performance regression.
> >
> > In the following example:
> >
> > #include 
> > extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *restrict 
> > pix2)
> > {
> >    for( int y = 0; y < 4; y++ )
> >   {    
> >     for( int x = 0; x < 4; x++ )
> >       d[x + y*4] = pix1[x] - pix2[x];
> >     pix1 += 16;  
> >     pix2 += 16;
> >  }
> >
> > The widening minus pattern is recognized and substituted, but cannot be 
> > used due to the input vector type chosen in slp vectorization. This results 
> > in an attempt to do an 8 byte->8 short widening subtract operation, which 
> > is not supported. 
> >
> > The issue is documented in PR 98772.
> 
> IMO removing just the sub patterns is too arbitrary.  Like you say,
> the PR affects all widening instructions. but it happens that the
> first encountered regression used subtraction.
> 
> I think we should try to fix the PR instead.  The widening operations
> can only be used for SLP if the group_size is at least double the
> number of elements in the vectype, so one idea (not worked through)
> is to make the vect_build_slp_tree family of routines undo pattern
> recognition for widening operations if the group size is less than that.
> 
> Richi would know better than me though.

Why should the widen ops be only usable with such constraints?
As I read md.texi they for example do v8qi -> v8hi operations
(where the v8qi is either lo or hi part of a v16qi vector).

The dumps show we use a VF of 4 which makes us have two v8hi
vectors and one v16qi which vectorizable_conversion should
be able to handle by emitting hi/lo widened subtracts.

Of course the dumps also show we fail vector construction
because of the permute.  If as I say you force strieded code-gen
we get optimal

wdiff:
.LFB0:
.cfi_startproc
lsl w3, w3, 4
ldr s0, [x1]
ldr s1, [x2]
sxtwx4, w3
ldr s3, [x1, w3, sxtw]
add x5, x2, x4
ldr s2, [x2, w3, sxtw]
add x1, x1, x4
add x2, x1, x4
add x4, x5, x4
ins v0.s[1], v3.s[0]
ldr s4, [x5, w3, sxtw]
ins v1.s[1], v2.s[0]
ldr s5, [x1, w3, sxtw]
ldr s2, [x4, w3, sxtw]
ldr s3, [x2, w3, sxtw]
ins v0.s[2], v5.s[0]
ins v1.s[2], v4.s[0]
ins v0.s[3], v3.s[0]
ins v1.s[3], v2.s[0]
usubl   v2.8h, v0.8b, v1.8b
usubl2  v0.8h, v0.16b, v1.16b
stp q2, q0, [x0]
ret

from

void wdiff( short d[16], unsigned char *restrict pix1, unsigned char 
*restrict pix2, int s)
{
   for( int y = 0; y < 4; y++ )
  {
for( int x = 0; x < 4; x++ )
  d[x + y*4] = pix1[x] - pix2[x];
pix1 += 16*s;
pix2 += 16*s;
 }
}

so the fix, if, is to fix the bug that mentions this issue
and appropriately classify / vectorize the load.

Richard

> Thanks,
> Richard
> 
> >
> >
> > [AArch64] Remove backend support for widen-sub
> >
> > This patch removes support for the widening subtract operation in the 
> > aarch64 backend as it is causing a performance regression.
> >
> > gcc/ChangeLog:
> >
> >         * config/aarch64/aarch64-simd.md    
> >         (vec_widen_subl_lo_): Removed.
> >         (vec_widen_subl_hi_): Removed.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/aarch64/vect-widen-sub.c: Removed.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[Patch, committed] gcc/fortran/intrinsic.texi: Fix typo (was: gfortran online docs: small typo in RESULT_IMAGE description)

2021-01-21 Thread Tobias Burnus


On 21.01.21 13:22, Jorge D'Elia via Fortran wrote:


A small typo in 
https://gcc.gnu.org/onlinedocs/gfortran/CO_005fREDUCE.html#CO_005fREDUCE


Thanks for the report – I regarded it as (almost) patch and, hence, used
you as author :-)

Committed as r11-6835-g9be0a89c95cc30089786faa26b89e8d7444c879e

Thanks,

Tobias

PS: The bug existed since 2014 ...

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 9be0a89c95cc30089786faa26b89e8d7444c879e
Author: Jorge D'Elia 
AuthorDate: Thu Jan 21 14:23:28 2021 +0100
Commit: Tobias Burnus 
CommitDate: Thu Jan 21 14:24:27 2021 +0100

gcc/fortran/intrinsic.texi: Fix typo

gcc/fortran/ChangeLog:

* intrinsic.texi (CO_MAX): Fix typo.

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index acf1cae3921..6e6aa26c25c 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -3753,7 +3753,7 @@ Collective subroutine
 @item @var{A}@tab shall be an integer, real or character variable,
 which has the same type and type parameters on all images of the team.
 @item @var{RESULT_IMAGE} @tab (optional) a scalar integer expression; if
-present, it shall have the same the same value on all images and refer to an
+present, it shall have the same value on all images and refer to an
 image of the current team.
 @item @var{STAT} @tab (optional) a scalar integer variable
 @item @var{ERRMSG}   @tab (optional) a scalar character variable

Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Richard Sandiford via Gcc-patches

Joel Hutton via Gcc-patches  writes:
> Hi all, 
>
> This patch removes support for the widening subtract operation in the aarch64 
> backend as it is causing a performance regression.
>
> In the following example:
>
> #include 
> extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *restrict 
> pix2)
> {
>    for( int y = 0; y < 4; y++ )
>   {    
>     for( int x = 0; x < 4; x++ )
>       d[x + y*4] = pix1[x] - pix2[x];
>     pix1 += 16;  
>     pix2 += 16;
>  }
>
> The widening minus pattern is recognized and substituted, but cannot be used 
> due to the input vector type chosen in slp vectorization. This results in an 
> attempt to do an 8 byte->8 short widening subtract operation, which is not 
> supported. 
>
> The issue is documented in PR 98772.

IMO removing just the sub patterns is too arbitrary.  Like you say,
the PR affects all widening instructions. but it happens that the
first encountered regression used subtraction.

I think we should try to fix the PR instead.  The widening operations
can only be used for SLP if the group_size is at least double the
number of elements in the vectype, so one idea (not worked through)
is to make the vect_build_slp_tree family of routines undo pattern
recognition for widening operations if the group size is less than that.

Richi would know better than me though.

Thanks,
Richard

>
>
> [AArch64] Remove backend support for widen-sub
>
> This patch removes support for the widening subtract operation in the aarch64 
> backend as it is causing a performance regression.
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-simd.md    
>         (vec_widen_subl_lo_): Removed.
>         (vec_widen_subl_hi_): Removed.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/vect-widen-sub.c: Removed.

Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Richard Biener via Gcc-patches

On Thu, Jan 21, 2021 at 12:37 PM Joel Hutton via Gcc-patches
 wrote:
>
> Hi all,
>
> This patch removes support for the widening subtract operation in the aarch64 
> backend as it is causing a performance regression.
>
> In the following example:
>
> #include 
> extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *restrict 
> pix2)
> {
>for( int y = 0; y < 4; y++ )
>   {
> for( int x = 0; x < 4; x++ )
>   d[x + y*4] = pix1[x] - pix2[x];
> pix1 += 16;
> pix2 += 16;
>  }
>
> The widening minus pattern is recognized and substituted, but cannot be used 
> due to the input vector type chosen in slp vectorization. This results in an 
> attempt to do an 8 byte->8 short widening subtract operation, which is not 
> supported.
>
> The issue is documented in PR 98772.

But it's not analyzed.  The fix is clearly not to remove the support
for this pattern but fix
what goes wrong in detecting it.

Richard.

>
> [AArch64] Remove backend support for widen-sub
>
> This patch removes support for the widening subtract operation in the aarch64 
> backend as it is causing a performance regression.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd.md
> (vec_widen_subl_lo_): Removed.
> (vec_widen_subl_hi_): Removed.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vect-widen-sub.c: Removed.

[GCC8 backport] AArch64: Fix symbol offset limit (PR 98618)

2021-01-21 Thread Wilco Dijkstra via Gcc-patches

In aarch64_classify_symbol symbols are allowed large offsets on relocations.
This means the offset can use all of the +/-4GB offset, leaving no offset
available for the symbol itself.  This results in relocation overflow and
link-time errors for simple expressions like &global_array + 0xff00.

To avoid this, unless the offset_within_block_p is true, limit the offset
to +/-1MB so that the symbol needs to be within a 3.9GB offset from its
references.  For the tiny code model use a 64KB offset, allowing most of
the 1MB range for code/data between the symbol and its references.

gcc/
PR target/98618
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.

gcc/testsuite/
PR target/98618
* gcc.target/aarch64/symbol-range.c: Improve testcase.
* gcc.target/aarch64/symbol-range-tiny.c: Likewise.

(cherry picked from commit 7d3b27ff12610fde9d6c4b56abc70c6ee9b6b3db)

---
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e8e73b8ea92b0dd3b9de661652c30c26c07bec86..7c4cf75b5a5e2394dc0b3f69b68d93df8f88111f
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12011,26 +12011,31 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
 the offset does not cause overflow of the final address.  But
 we have no way of knowing the address of symbol at compile time
 so we can't accurately say if the distance between the PC and
-symbol + offset is outside the addressible range of +/-1M in the
-TINY code model.  So we rely on images not being greater than
-1M and cap the offset at 1M and anything beyond 1M will have to
-be loaded using an alternative mechanism.  Furthermore if the
-symbol is a weak reference to something that isn't known to
-resolve to a symbol in this module, then force to memory.  */
- if ((SYMBOL_REF_WEAK (x)
-  && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (offset, -1048575, 1048575))
+symbol + offset is outside the addressible range of +/-1MB in the
+TINY code model.  So we limit the maximum offset to +/-64KB and
+assume the offset to the symbol is not larger than +/-(1MB - 64KB).
+If offset_within_block_p is true we allow larger offsets.
+Furthermore force to memory if the symbol is a weak reference to
+something that doesn't resolve to a symbol in this module.  */
+
+ if (SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x))
return SYMBOL_FORCE_TO_MEM;
+ if (!(IN_RANGE (offset, -0x1, 0x1)
+   || offset_within_block_p (x, offset)))
+   return SYMBOL_FORCE_TO_MEM;
+
  return SYMBOL_TINY_ABSOLUTE;
 
case AARCH64_CMODEL_SMALL:
  /* Same reasoning as the tiny code model, but the offset cap here is
-4G.  */
- if ((SYMBOL_REF_WEAK (x)
-  && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+1MB, allowing +/-3.9GB for the offset to the symbol.  */
+
+ if (SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x))
return SYMBOL_FORCE_TO_MEM;
+ if (!(IN_RANGE (offset, -0x10, 0x10)
+   || offset_within_block_p (x, offset)))
+   return SYMBOL_FORCE_TO_MEM;
+
  return SYMBOL_SMALL_ABSOLUTE;
 
case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..fc6a4f3ec780d9fa86de1c8e1a42a55992ee8b2d
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */
 
-int fixed_regs[0x0020];
+char fixed_regs[0x0008];
 
 int
-foo()
+main ()
 {
-  return fixed_regs[0x0008];
+  return fixed_regs[0x000ff000];
 }
 
 /* { dg-final { scan-assembler-not "adr\tx\[0-9\]+, fixed_regs\\\+" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
index 
6574cf4310430b847e77ea56bf8f20ef312d53e4..d8e82fa1b2829fd300b6ccf7f80241e5573e7e17
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=small" } */
 
-int fixed_regs[0x2ULL];
+char fixed_regs[0x8000];
 
 int
-foo()
+main ()
 {
-  return fixed_regs[0x1ULL];
+  return fixed_regs[0xf000];
 }
 
 /* { dg-final { scan-assembler-not "adrp\

c++: Stat-hack for members [PR 98530]

2021-01-21 Thread Nathan Sidwell



This was a header file that deployed the stat-hack inside a class
(both a member-class and a [non-static data] member had the same
name).  Due to the way that's represented in name lookup we missed the
class.  Sadly just changing the representation globally has
detrimental effects elsewhere, and this is a rare case, so just
creating a new overload on the fly shouldn't be a problem.

PR c++/98530
gcc/cp/
* name-lookup.c (lookup_class_binding): Rearrange a stat-hack.
gcc/testsuite/
* g++.dg/modules/stat-mem-1.h: New.
* g++.dg/modules/stat-mem-1_a.H: New.
* g++.dg/modules/stat-mem-1_b.C: New.
--
Nathan Sidwell
diff --git c/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index b4b6c0b81b5..c99f2e3622d 100644
--- c/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -3926,11 +3926,16 @@ lookup_class_binding (tree klass, tree name)
   vec *member_vec = CLASSTYPE_MEMBER_VEC (klass);
 
   found = member_vec_binary_search (member_vec, name);
-  if (IDENTIFIER_CONV_OP_P (name))
+  if (!found)
+	;
+  else if (STAT_HACK_P (found))
+	/* Rearrange the stat hack so that we don't need to expose that
+	   internal detail.  */
+	found = ovl_make (STAT_TYPE (found), STAT_DECL (found));
+  else if (IDENTIFIER_CONV_OP_P (name))
 	{
 	  gcc_checking_assert (name == conv_op_identifier);
-	  if (found)
-	found = OVL_CHAIN (found);
+	  found = OVL_CHAIN (found);
 	}
 }
   else
diff --git c/gcc/testsuite/g++.dg/modules/stat-mem-1.h w/gcc/testsuite/g++.dg/modules/stat-mem-1.h
new file mode 100644
index 000..b5703ea2262
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/stat-mem-1.h
@@ -0,0 +1,6 @@
+
+struct fpu {
+  struct NAME {
+int state;
+  } NAME;
+};
diff --git c/gcc/testsuite/g++.dg/modules/stat-mem-1_a.H w/gcc/testsuite/g++.dg/modules/stat-mem-1_a.H
new file mode 100644
index 000..6daa137be4f
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/stat-mem-1_a.H
@@ -0,0 +1,5 @@
+// { dg-additional-options -fmodule-header }
+// PR c++ 98530  stat-hack inside a structure
+// { dg-module-cmi {} }
+
+#include "stat-mem-1.h"
diff --git c/gcc/testsuite/g++.dg/modules/stat-mem-1_b.C w/gcc/testsuite/g++.dg/modules/stat-mem-1_b.C
new file mode 100644
index 000..9b83d4e6e30
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/stat-mem-1_b.C
@@ -0,0 +1,4 @@
+// { dg-additional-options "-fmodules-ts -fno-module-lazy" }
+
+#include "stat-mem-1.h"
+import "stat-mem-1_a.H";

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-21 Thread CHIGOT, CLEMENT via Gcc-patches

Hi everyone, 

Here is a new version of the patch. I've tested on Linux and AIX.
There are still some tests failing but it starts having a good shape ! 
However, I have few questions:

1) locale.name and syscalls
locale.name() is returning a string having the description of each locale
category. It looks like 
"LC_CTYPE=en_US.UTF-8; LC_NUMERIC=en_US.UTF-8; ...".
However, in locale::global() or sometimes in c_locale.cc functions, this
name is used as arguments of setlocale, newlocale, etc. 
It seems to work with GNU locale model but when I'm trying to do it 
with the POSIX_2008 model, it doesn't work. A simple C program 
seems to refuse it, anyway.
Thus, is there any define on Linux enabling this behavior ? And in 
a more general way, I'm not sure it will work on all POSIX 2008 
system. We might need to modify std:global() and other functions 
ending up using locale.name() as syscalls argument. 

2) Detect locale model during tests
Is there already a function in the testsuite to detect which locale model
is being used ? I didn't find any and as I'm not use to runtest scripts, 
I don't really know how to implement one. 
Ideally, it would be something like "has_locale_modele { gnu }". It would 
allow to skip some tests which are made only for GNU model.
Is there any function I can based myself on ?  

3) POSIX 2017 and non-POSIX functions
Many of the *_l functions being used in GNU or dragonfly models aren't 
POSIX 2008, but mainly POSIX 2017 or like strtof_l not POSIX at all. 
However, there are really useful in the code, thus I've made a double 
implementation based on "#ifdef HAVE_". Is it ok for you ? It's not really
POSIX 2008 but more POSIX 2008 with 2017 compatibility. 
For the configure, I didn't find any better way to check each syscall, as 
they all depend on different includes. Tell me if you have a better idea.

4) ctype_configure_char.cc 
I've some troubles knowing what is supposed to be implemented on this file. 
I don't really understand the part with setlocale which appears in many 
os. When I'm adding it, some tests start failing, some start working... 
Moreover, on Linux, if I understand correctly, there is some optimizations 
based on classic_table(), _M_toupper and _M_tolower. Could you confirm 
that it's only useful on Linux ?

5) Some tests results
Here are the remaining tests failing on Linux x86:
FAIL: 22_locale/locale/cons/29217.cc execution test
FAIL: 22_locale/locale/cons/38368.cc execution test
FAIL: 22_locale/locale/cons/40184.cc execution test
FAIL: 22_locale/locale/cons/5.cc execution test
FAIL: 22_locale/locale/global_locale_objects/14071.cc execution test
 => linked to 1)

FAIL: 22_locale/messages/13631.cc execution test
FAIL: 22_locale/messages/members/char/1.cc execution test
FAIL: 22_locale/messages/members/char/2.cc execution test
FAIL: 22_locale/messages/members/char/wrapped_env.cc execution test
FAIL: 22_locale/messages/members/char/wrapped_locale.cc execution test
FAIL: 22_locale/messages_byname/named_equivalence.cc execution test
  => linked to message_members.cc not being implemented.
Reason behind 2)

FAIL: 22_locale/numpunct/members/char/3.cc execution test
  => No idea yet. Maybe 1) too.

FAIL: 22_locale/time_get/get_time/char/2.cc execution test
FAIL: 22_locale/time_get/get_time/char/wrapped_env.cc execution test
FAIL: 22_locale/time_get/get_time/char/wrapped_locale.cc execution test
FAIL: 22_locale/time_get/get_time/wchar_t/2.cc execution test
FAIL: 22_locale/time_get/get_time/wchar_t/wrapped_env.cc execution test
FAIL: 22_locale/time_get/get_time/wchar_t/wrapped_locale.cc execution test
  => Not related. 

Feel free to try in on other OS. But I've made modifications only for AIX and 
Linux, as I can test the other ones. 

Thanks, 
Clément
From 6b82a9c6b49d16e701f096891550c93661a58bbe Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= 
Date: Tue, 29 Dec 2020 11:08:33 +0100
Subject: [PATCH] libstdc++: implement locale support for POSIX 2008
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The implementation is based on dragonfly one.
It also adds support for AIX with a few tweaks.
As of now, a few locale functions are missing on AIX.
For strftime_l, localeconv_l, mbstowcs_l and wcsftime_l,
uselocale must be set prior to use the version without _l.
For strtof_l, strtod_l, strtold_l, a wrapper simply calls
the default version.

libstdc++-v3/ChangeLog:
2021-01-12  ClÃ©ment Chigot  

* acinclude.m4: Add ieee_1003.1-2008 locale model.
* configure: Regenerate.
* config/os/aix/ctype_configure_char.cc: Enable locale support.
* testsuite/lib/libstdc++.exp (check_v3_target_namedlocale):
Handle AIX locale names.
* testsuite/util/testsuite_hooks.h: Likewise.
* config/locale/dragonfly/c_locale.cc: Removed.
* config/locale/dragonfly/c_locale.h: Removed.
* config/locale/dragonfly/codecvt_members.cc: Removed.
* config/locale/dragonfly/col

[PATCH] RISC-V: Fix -march option parsing when `p` extension exists.

2021-01-21 Thread Xing GUO via Gcc-patches

This patch fixes -march option parsing when `p` extension exists,
e.g., -march=rv64imafdcp should produce

.attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_p"

rather than

.attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c_p"

---
gcc/ChangeLog:

* common/config/riscv/riscv-common.c
(riscv_subset_list::parsing_subset_version):
Fix -march option parsing when `p` extension exists.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-18.c: New test.
-- 

Cheers,
Xing
diff --git a/gcc/common/config/riscv/riscv-common.c b/gcc/common/config/riscv/riscv-common.c
index b3f5c07c819..a034e218b75 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -527,8 +527,7 @@ riscv_subset_list::parsing_subset_version (const char *ext,
 	  /* Might be beginning of `p` extension.  */
 	  if (std_ext_p)
 		{
-		  *major_version = version;
-		  *minor_version = 0;
+		  get_default_version (ext, major_version, minor_version);
 		  *explicit_version_p = true;
 		  return p;
 		}
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-18.c b/gcc/testsuite/gcc.target/riscv/attribute-18.c
new file mode 100644
index 000..30a12543ed2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-18.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imafdcp -mabi=lp64d" } */
+int foo() {}
+/* { dg-final { scan-assembler ".attribute arch, \"rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_p\"" } } */

Re: [PATCH v2] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Richard Sandiford via Gcc-patches

Ilya Leoshkevich via Gcc-patches  writes:
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563800.html
>
> v1 -> v2: Allow (mem) -> (subreg) propagation only for single uses.
>
> Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
> and s390x-redhat-linux.  Ok for master?
>
>
>
> Suppose we have:
>
> (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
> (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))
>
> It is clearly profitable to propagate the first insn into the second
> one and get:
>
> (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))
>
> fwprop actually manages to perform this, but doesn't think the result is
> worth it, which results in unnecessary store/load sequences on s390.
> Improve the situation by classifying SUBREG -> MEM changes as
> profitable.
>
> gcc/ChangeLog:
>
> 2021-01-15  Ilya Leoshkevich  
>
>   * fwprop.c (fwprop_propagation::classify_result): Allow
>   (subreg (mem)) simplifications.
> ---
>  gcc/fwprop.c | 22 +-
>  1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/fwprop.c b/gcc/fwprop.c
> index eff8f7cc141..02d3d507cbc 100644
> --- a/gcc/fwprop.c
> +++ b/gcc/fwprop.c
> @@ -176,7 +176,7 @@ namespace
>  static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
>  static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
>  
> -fwprop_propagation (rtx_insn *, rtx, rtx);
> +fwprop_propagation (rtx_insn *, insn_info *, rtx, rtx);
>  
>  bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
>  bool folded_to_constants_p () const;
> @@ -185,13 +185,18 @@ namespace
>  bool check_mem (int, rtx) final override;
>  void note_simplification (int, uint16_t, rtx, rtx) final override;
>  uint16_t classify_result (rtx, rtx);
> +
> +  private:
> +const bool single_use_p;
>};
>  }
>  
>  /* Prepare to replace FROM with TO in INSN.  */
>  
> -fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to)
> -  : insn_propagation (insn, from, to)
> +fwprop_propagation::fwprop_propagation (rtx_insn *insn, insn_info *def_insn,
> + rtx from, rtx to)
> +: insn_propagation (insn, from, to),
> +  single_use_p (def_insn->num_uses () == 1)

I think we should check whether the def and use are in the same ebb
as well.  For that, I guess we should change the first insn to be an
“insn_info *” too (and perhaps rename it to use_insn, now that there
are two insns in play).

>  {
>should_check_mems = true;
>should_note_simplifications = true;
> @@ -262,6 +267,13 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
> new_rtx)
>&& GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
>  return PROFITABLE;
>  
> +  /* Allow (subreg (mem)) -> (mem) simplifications.  Do not allow propagation
> + of (mem)s into multiple uses, since those are not profitable, as well as
> + creating new (mem/v)s, since DCE will not remove the old ones.  */
> +  if (single_use_p && SUBREG_P (old_rtx) && MEM_P (new_rtx)
> +  && !MEM_VOLATILE_P (new_rtx))
> +return PROFITABLE;

Nit: one check per line if they don't fit on a single line.

I think we should check !paradoxical_subreg_p (old_rtx) too.  I'm not
sure we'd allow the propagation in that case, but it seems like something
we should check for performance reasons too.

Given what you said in the other message about combine, I agree this
is a reasonable workaround.  I don't know whether it's suitable for
stage 4 or whether it would need to wait for stage 1.

Thanks,
Richard

1 2 >

1 - 100 of 120 matches

Mail list logo