[PATCH] Restore can_be_invalidated_p semantics to before refactoring

2021-11-25 Thread Richard Biener via Gcc-patches
This restores the semantics of can_be_invalidated_p to the original
semantics of the function this was split out from tree-ssa-uninit.c.
The current semantics only ever look at the first predicate which
cannot be correct.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-11-26  Richard Biener  

* gimple-predicate-analysis.cc (can_be_invalidated_p):
Restore semantics to the one before the split from
tree-ssa-uninit.c.
---
 gcc/gimple-predicate-analysis.cc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc
index 6dde0203841..da6adc9a3e2 100644
--- a/gcc/gimple-predicate-analysis.cc
+++ b/gcc/gimple-predicate-analysis.cc
@@ -1199,14 +1199,16 @@ can_be_invalidated_p (const pred_chain_union , 
const pred_chain )
   for (unsigned i = 0; i < preds.length (); ++i)
 {
   const pred_chain  = preds[i];
-  for (unsigned j = 0; j < chain.length (); ++j)
+  unsigned j;
+  for (j = 0; j < chain.length (); ++j)
if (can_be_invalidated_p (chain[j], guard))
- return true;
+ break;
 
   /* If we were unable to invalidate any predicate in C, then there
 is a viable path from entry to the PHI where the PHI takes
 an interesting value and continues to a use of the PHI.  */
-  return false;
+  if (j == chain.length ())
+   return false;
 }
   return true;
 }
-- 
2.31.1


Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-25 Thread Richard Biener via Gcc-patches
On Thu, Nov 25, 2021 at 11:38 AM Aldy Hernandez  wrote:
>
> On Wed, Nov 24, 2021 at 9:00 AM Richard Biener
>  wrote:
> >
> > On Tue, Nov 23, 2021 at 5:36 PM Martin Liška  wrote:
> > >
> > > On 11/23/21 16:20, Martin Liška wrote:
> > > > Sure, so for e.g. case 1 ... 5 we would need to create a new 
> > > > unswitch_predicate
> > > > with 1 <= index && index <= 5 tree predicate (and the corresponding 
> > > > irange range).
> > > > Later once we unswitch on it, we should use a special unreachable_flag 
> > > > that will
> > > > be used for marking of dead edges (similarly how we fold gconds to 
> > > > boolean_{false/true}_node.
> > > > Does it make sense?
> > >
> > > I have thought about it more and it's not enough. What we really want is 
> > > having a irange
> > > for *each edge* (2 for gconds and multiple for gswitchs). Once we select 
> > > a unswitch_predicate,
> > > then we need to fold_range in true/false loop all these iranges. Doing 
> > > that we can handle situations like:
> > >
> > > if (index < 1)
> > > do_something1
> > >
> > > if (index > 2)
> > > do_something2
> > >
> > > switch (index)
> > > case 1 ... 2:
> > >   do_something;
> > > ...
> > >
> > > as seen the once we unswitch on 'index < 1' and 'index > 2', then the 
> > > first case will be taken in the false_edge
> > > of 'index > 2' loop unswitching.
> >
> > Hmm.  I'm not sure it needs to be this complicated.  We're basically
> > evaluating ranges/predicates based
> > on a fixed set of versioning predicates.  Your implementation created
> > "predicates" for the to be simplified
> > conditions but in the end we like to evaluate the actual stmt to
> > figure the taken/not taken edges.  IIRC
> > elsewhere Andrew showed a snipped on how to evaluate a stmt with a
> > given range - not sure if that
> > was useful enough.  So what I think would be nice if we could somehow
> > use rangers path query
> > without an actual CFG.  So we virtuall have
> >
> >   if (versioning-predicate1)
> > if (versioning-predicate2)
> >;
> >else
> >   for (;;) // out current loop
> > {
> >   ...
> >   if (condition)
> > ;
> >  ...
> >   switch (var)
> >  {
> > ...
> >   }
> > }
> >
> > and versioning-predicate1 and versioning-predicate2 are not in the IL.
> > What we'd like
> > to do is seed the path query with a "virtual" path through the two
> > predicates to the
> > entry of the loop and compute_ranges based on those.  Then we like to
> > use range_of_stmt on 'if (condition)' and 'switch (var)' to determine
> > not taken edges.
>
> Huh, that's an interesting idea.  We could definitely adapt
> path_range_query to work with an artificial sequence of blocks, but it
> would need some surgery.  Off the top of my head:
>
> a) The phi handling code looks for specific edges in the path (both
> for intra path ranges and for relations inherent in PHIs).
> b) The exported ranges between blocks in the path, probably needs some
> massaging.
> c) compute_outgoing_relations would need some work as you mention below...
>
> > Looking somewhat at the sources it seems like we "simply" need to do what
> > compute_outgoing_relations does - unfortunately the code lacks comments
> > so I have no idea what jt_fur_source src (...).register_outgoing_edges does 
> > ...
>
> fur_source is an abstraction for operands to the folding mechanism:
>
> // Source of all operands for fold_using_range and gori_compute.
> // It abstracts out the source of an operand so it can come from a stmt or
> // and edge or anywhere a derived class of fur_source wants.
> // The default simply picks up ranges from the current range_query.
>
> class fur_source
> {
> }
>
> When passed to register_outgoing_edges, it registers outgoing
> relations out of a conditional.  I pass it the known outgoing edge out
> of the conditional, so only the relational on that edge is recorded.
> I have overloaded fur_source into a path specific jt_fur_source that
> uses a path_oracle to register relations as they would occur along a
> path.  Once register_outgoing_edges is called on each outgoing edge
> between blocks in a path, the relations will have been set, and can be
> seen by the range_of_stmt:
>
> path_range_query::range_of_stmt (irange , gimple *stmt, tree)
> {
> ...
>   // If resolving unknowns, fold the statement making use of any
>   // relations along the path.
>   if (m_resolve)
> {
>   fold_using_range f;
>   jt_fur_source src (stmt, this, _ranger->gori (), m_path);
>   if (!f.fold_stmt (r, stmt, src))
> r.set_varying (type);
> }
> ...
> }
>
> register_outgoing_edges would probably have to be adjusted for your
> CFGless paths, and maybe the path_oracle (Andrew??).

So conceptually we'd attach extra predicates to an edge (a single one,
and even the "entry" edge of the path in this specific case).  That is,
instead of explicit

   \
   if (p1)
   \
 

Re: [PATCH take 3] ivopts: Improve code generated for very simple loops.

2021-11-25 Thread Richard Biener via Gcc-patches
On Thu, Nov 25, 2021 at 7:17 PM Roger Sayle  wrote:
>
>
> On Tue, Nov 23, 2021 at 12:46PM Richard Biener < richard.guent...@gmail.com> 
> wrote:
> > On Thu, Nov 18, 2021 at 4:18 PM Roger Sayle 
> > wrote:
> > > > The patch doesn't add any testcase.
> > >
> > > The three new attached tests check that the critical invariants have a
> > > simpler form, and hopefully shouldn't be affected by whether the
> > > optimizer and/or backend costs actually decide to perform this iv 
> > > substitution
> > or not.
> >
> > The testcases might depend on lp64 though, did you test them with -m32?
> > IMHO it's fine to require lp64 here.
>
> Great catch.  You're right that when the loop index has the same precision as 
> the
> target's pointer, that fold is (already) able to simplify the ((EXPR)-1)+1, 
> so that with
> -m32 my new tests ivopts-[567].c fail.  I've added "require lp64" to those 
> tests, but
> I've also added two more tests, using char and unsigned char for the loop 
> expression,
> which are optimized on both ilp32 and lp64.
>
> For example, with -O2 -m32, we see the following improvements in ivopts-8.c:
> diff ivopts-8.old.s ivopts-8.new.s
> 14,16c14,15
> <   subl$1, %ecx
> <   movzbl  %cl, %ecx
> <   leal4(%eax,%ecx,4), %ecx
> ---
> >   movsbl  %cl, %ecx
> >   leal(%eax,%ecx,4), %ecx
>
> This might also explain why GCC currently generates sub-optimal code.  Back 
> when
> ivopts was written, most folks were on i686, so the generated code was 
> optimal.
> But with the transition to x86_64, the code is correct, just slightly less 
> efficient.
>
> > I'm a bit unsure about adding this special-casing in cand_value_at in 
> > general - it
> > does seem that we're doing sth wrong elsewhere - either by not simplifying 
> > even
> > though enough knowledge is there or by throwing away knowledge earlier
> > (during niter analysis?).
>
> I agree this approach is a bit ugly.  Conceptually, an alternative might be 
> to avoid
> throwing away knowledge earlier, during niter analysis, by adding an extra 
> tree field
> to the tree_niter_desc structure, so that it returns both niter0 (the 
> iteration count
> at the top of the loop) and niter1 (the iteration count at the bottom of the 
> loop),
> so that later passes (cand_value_at) can use the tree that's relevant.

Yes, I also thought of this but I wasn't sure we always have that.  I also
wouldn't think of it as too ugly, but well ... it would definitely be useful
elsewhere.  Btw, it's generally the number of executions of the latch vs.
the number of executions of the header - currently what niter analysis
computes is the number of executions of the latch.  There are loops where
the number of iterations of the header is not representable in the IV.

>  Alas, this too is
> ugly, and inefficient as we're creating/folding trees that may never be 
> used/useful.
> A compromise might be to add an enum field describing how the niter was
> calculated to tree_niter_desc, and this can be inspected/used by 
> cand_value_at.
> The current patch figures this out by examining the other fields already in
> tree_niter_desc.
>
>
> > Anyway, the patch does look quite safe - can you show some statistics in how
> > many time there's extra simplification this way during say bootstrap?
>
> Certainly.  During stage2 and stage3 of a bootstrap on x86_64-pc-linux-gnu,
> cand_value_at is called 500657 times.  The majority of calls,
> 447607 (89.4%), request the value at the end of the loop (after_adjust),
> while 53050 (10.6%) request the value at the start of the loop.
> 102437 calls (20.5%) are optimized by clause 1 [0..N loops]
> 27939 calls (5.6%) are optimized by clause 2 [beg..end loops]

Thanks for the detailed data.

> Looking for opportunities to improve things further, I see that
> 319608 calls (63.8%) have a LT_EXPR exit test.
> 160965 calls (32.2%) have a NE_EXPR exit test.
> 20084 calls (4.0%) have a GT_EXPR exit test.
> so handling descending loops wouldn’t be a big win.
> I'll investigate whether (constant) step sizes other than 1 are
> (i) sufficiently common and (ii) benefit from improved folding.
>
>
> This revised patch has been test on x86_64-pc-linux-gnu with a
> make bootstrap and make -k check, both with and without
> --target-board=unix{-m32}, with no new failures.
> Ok for mainline?

OK.

Thanks,
Richard.

> 2021-11-25  Roger Sayle  
>
> gcc/ChangeLog
> * tree-ssa-loop-ivopts.c (cand_value_at): Take a class
> tree_niter_desc* argument instead of just a tree for NITER.
> If we require the iv candidate value at the end of the final
> loop iteration, try using the original loop bound as the
> NITER for sufficiently simple loops.
> (may_eliminate_iv): Update (only) call to cand_value_at.
>
> gcc/testsuite
> * gcc.dg/wrapped-binop-simplify.c: Update expected test result.
> * gcc.dg/tree-ssa/ivopts-5.c: New test case.
> * gcc.dg/tree-ssa/ivopts-6.c: New test case.
>   

[Bug fortran/103434] New: Pointer subobject does not show to correct memory location

2021-11-25 Thread baradi09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103434

Bug ID: 103434
   Summary: Pointer subobject does not show to correct memory
location
   Product: gcc
   Version: 10.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: baradi09 at gmail dot com
  Target Milestone: ---

Based on the discussion on FD
(https://fortran-lang.discourse.group/t/is-the-section-of-a-pointer-to-an-array-a-valid-pointer/2331),
I'd assume, that the following code is standard conforming. However, the result
with gfortran seems to be incorrect.

*** Code:

module test
  implicit none

  type :: pointer_wrapper
real, pointer :: ptr(:) => null()
  end type pointer_wrapper

contains

  subroutine store_pointer(wrapper, ptr)
type(pointer_wrapper), intent(out) :: wrapper
real, pointer, intent(in) :: ptr(:)
wrapper%ptr => ptr
  end subroutine store_pointer


  subroutine use_pointer(wrapper)
type(pointer_wrapper), intent(inout) :: wrapper
wrapper%ptr(:) = wrapper%ptr + 1.0
  end subroutine use_pointer

end module test


program testprog
  use test
  implicit none

  real, allocatable, target :: data(:,:)
  real, pointer :: ptr(:,:)

  type(pointer_wrapper) :: wrapper
  integer :: ii

  allocate(data(4, 2))
  ptr => data(:,:)
  data(:,:) = 0.0
  do ii = 1, size(data, dim=2)
print *, "#", ii
print *, "BEFORE ", ii, maxval(ptr(:,ii))
call store_pointer(wrapper, ptr(:,ii))
print *, "BETWEEN", ii, maxval(ptr(:,ii))
call use_pointer(wrapper)
print *, "AFTER  ", ii, maxval(ptr(:,ii))
  end do

end program testprog

*** Output:

 #   1
 BEFORE1   0.
 BETWEEN   1   0.
 AFTER 1   1.
 #   2
 BEFORE2   1.
 BETWEEN   2   1.
 AFTER 2   1.

*** Expected output:

 #   1
 BEFORE1   0.
 BETWEEN   1   0.
 AFTER 1   1.
 #   2
 BEFORE2   0.
 BETWEEN   2   0.
 AFTER 2   1.

It seems, as if store_pointer would point to a memory location larger as it
should be, so that also data outside of the actual stride is modified. Intel
and NAG deliver the expected output.

[Bug ipa/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref

2021-11-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug middle-end/103431] [12 Regression] wrong code with -O -fno-tree-bit-ccp -fno-tree-dominator-opts

2021-11-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103431

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
   Keywords||needs-bisection
   Target Milestone|--- |12.0

[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64

2021-11-25 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271

--- Comment #7 from rguenther at suse dot de  ---
On Fri, 26 Nov 2021, wilson at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271
> 
> Jim Wilson  changed:
> 
>What|Removed |Added
> 
>  CC||wilson at gcc dot gnu.org
> 
> --- Comment #5 from Jim Wilson  ---
> SiFive doesn't support -mno-strict-align so I've never tested it.  I doubt 
> that
> it works correctly, i.e. I doubt that it optimizes as intended.  I've 
> mentioned
> this to other RVI members, but there hasn't been anyone other than SiFive
> actively working on upstream gcc so I don't think anyone ever looked at it.  
> It
> shouldn't give an ICE though.
> 
> Looking at this, it appears to be another "if only we had a movti pattern"
> issue.
> 
> In expand_DEFERRED_INIT in internal-fn.c, in the reg_lhs == TRUE case, there 
> is
> a test
>   && have_insn_for (SET, var_mode))
> which fails because var_mode is TImode and we don't have a movti pattern.  The
> code calls build_zero_cst which returns a constructor with an array type.  We
> then call expand_assignment which gets confused as it doesn't know the size of
> the array it is copying.

That seems to be the bug - in this path we shouldn't ever create an
entity with VLA size since we do know the actual size.  But it all
is a bit awkward.

[Bug target/102768] [feature request] Add compiler support for aarch64 shadow call stack

2021-11-25 Thread ashimida at linux dot alibaba.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

--- Comment #6 from ashimida  ---
RFC,v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585496.html

[PATCH] [RFC, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-25 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
New.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
New decl.
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
New.
(aarch64_expand_prologue):  Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue):  Pop x30 frome SCS.
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
New macro.
(TARGET_CHECK_SCS_RESERVED_REGISTER):   Likewise.
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop):  Likewise.
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro.
* doc/extend.texi:  Document -fsanitize=shadow-call-stack.
* doc/invoke.texi:  Document attribute.
* flag-types.h (enum sanitize_code):Add
SANITIZE_SHADOW_CALL_STACK.
* opts-global.c (handle_common_deferred_options):   Add SCS
compile option check.
* opts.c (finish_options):  Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
 gcc/c-family/c-attribs.c  | 21 +
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 27 +++
 gcc/config/aarch64/aarch64.h  | 11 +
 gcc/config/aarch64/aarch64.md | 18 
 gcc/defaults.h|  4 ++
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/invoke.texi   | 29 
 gcc/flag-types.h  |  2 +
 gcc/opts-global.c |  6 +++
 gcc/opts.c| 12 +
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 
 15 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in
+   struct attribute_spec.handler.  */
+static tree

Re: [EXTERNAL] Re: Question about match.pd

2021-11-25 Thread Richard Biener via Gcc
On Thu, Nov 25, 2021 at 10:40 PM Navid Rahimi via Gcc  wrote:
>
> > (A << B) eq/ne 0
> Yes that is correct. But for detecting such pattern you You have to detect B 
> and make sure B is boolean.  GIMPLE transfers that Boolean to integer before 
> shifting.

Note it's the C language specification that requires this.

> After many hours of debugging, I think I managed to find out what is going on.
>
> +/* cmp : ==, != */
> +/* ((B0 << x) cmp 0) -> B0 cmp 0 */
> +(for cmp (eq ne)
> + (simplify
> +  (cmp (lshift (convert@3 boolean_valued_p@0) @1) integer_zerop@2)
> +   (if (TREE_CODE (TREE_TYPE (@3)) == INTEGER_TYPE
> +   && (GIMPLE || !TREE_SIDE_EFFECTS (@1)))
> +(cmp @0 @2
>
> So when I am transforming something like above pattern to (cmp @0 @2) there 
> is a type mismatch between @0 and @2.
> @0 is boolean and @2 is integer. That type mismatch does cause a lot of 
> headache when going through resimplification.

Yeah, guess you need

   (cmp @0 { build_zero_cst (TREE_TYPE (@0); })

here.

>
>
> Best wishes,
> Navid.
>
> 
> From: Jeff Law 
> Sent: Wednesday, November 24, 2021 15:11
> To: Navid Rahimi; gcc@gcc.gnu.org
> Subject: [EXTERNAL] Re: Question about match.pd
>
>
>
> On 11/24/2021 2:19 PM, Navid Rahimi via Gcc wrote:
> > Hi GCC community,
> >
> > I have a question about pattern matching in match.pd.
> >
> > So I have a pattern like this [1]:
> > #define CMP !=
> > bool f(bool c, int i) { return (c << i) CMP 0; }
> > bool g(bool c, int i) { return c CMP 0;}
> >
> > It is verifiably correct to transfer f to g [2]. Although this pattern 
> > looks simple, but the problem rises because GIMPLE converts booleans to int 
> > before "<<" operation.
> > So at the end you have boolean->integer->boolean conversion and the shift 
> > will happen on the integer in the middle.
> >
> > For example, for something like:
> >
> > bool g(bool c){return (c << 22);}
> >
> > The GIMPLE is:
> > _Bool g (_Bool c)
> > {
> >int _1;
> >int _2;
> >_Bool _4;
> >
> > [local count: 1073741824]:
> >_1 = (int) c_3(D);
> >_2 = _1 << 22;
> >_4 = _2 != 0;
> >return _4;
> > }
> >
> > I wrote a patch to fix this problem in match.pd:
> >
> > +(match boolean_valued_p
> > + @0
> > + (if (TREE_CODE (type) == BOOLEAN_TYPE
> > +  && TYPE_PRECISION (type) == 1)))
> > +(for op (tcc_comparison truth_and truth_andif truth_or truth_orif 
> > truth_xor)
> > + (match boolean_valued_p
> > +  (op @0 @1)))
> > +(match boolean_valued_p
> > +  (truth_not @0))
> >
> > +/* cmp : ==, != */
> > +/* ((B0 << x) cmp 0) -> B0 cmp 0 */
> > +(for cmp (eq ne)
> > + (simplify
> > +  (cmp (lshift (convert@3 boolean_valued_p@0) @1) integer_zerop@2)
> > +   (if (TREE_CODE (TREE_TYPE (@3)) == INTEGER_TYPE
> > +   && (GIMPLE || !TREE_SIDE_EFFECTS (@1)))
> > +(cmp @0 @2
> >
> >
> > But the problem is I am not able to restrict to the cases I am interested 
> > in. There are many hits in other libraries I have tried compiling with 
> > trunk+patch.
> >
> > Any feedback?
> >
> > 1) 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D98956data=04%7C01%7Cnavidrahimi%40microsoft.com%7Caa8c9c8213a245c7ae9d08d9af9fc8ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733923073627850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=25KlLcsftTmN83rVawoKKaTPJdCdFlmtXMj%2BwsrKWbo%3Dreserved=0
> > 2) 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2FUUTJ_vdata=04%7C01%7Cnavidrahimi%40microsoft.com%7Caa8c9c8213a245c7ae9d08d9af9fc8ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733923073637846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=fwN9%2BB0VObPyuUS2fOtj14i%2BHJIiRhyyjZM4LOF4AP8%3Dreserved=0
> It would help to also see the cases you're triggering that you do not
> want to trigger.
>
> Could we think of the optimization opportunity in a different way?
>
>
> (A << B) eq/ne 0  -> A eq/ne (0U >> B)
>
> And I would expect the 0U >> B to get simplified to 0.
>
> Would looking at things that way help?
>
> jeff


Re: atomic_load

2021-11-25 Thread Martin Uecker via Gcc
Am Sonntag, den 07.11.2021, 10:08 +0100 schrieb Martin Uecker:
> It would be great if somebody could take a look at
> PR96159. 
> 
> It seems we do not do atomic accesses correctly
> when the alignment is insufficient for a lockfree
> access, but I think we should fall back to a
> library call in this case (as clang does).
> 
> This is very unfortunate as it is an important
> functionality to be able to do atomic accesses 
> on non-atomic types and it seems there is no way
> to achieve this.
> 
> Also documentation and various descriptions of
> the atomic functions imply that this is expected
> to work.
> 
> But maybe I am missing something and the generated
> code is indeed safe.
> 
> Martin
> 

Could this bug be confirmed please? 

This is a silent and dangerous incorrect code generation issue.  

If these functions are not meant to be used to exising
data,  then at least the documentation needs to be changed
and include a big warning that this only happens to work
corectly if the data has  sufficient alignment for the
specific architecture (which of course makes it impossible
to use this in a portable way).

I would then propose to add atomic_load_safe,
so that it is possible to use such functionality safely
on existing data structures which is an important use
case.

Martin






[r12-5536 Regression] FAIL: gfortran.dg/widechar_2.f90 -O0 (test for excess errors) on Linux/x86_64

2021-11-25 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

90cb088ece8d8cc1019d25629d1585e5b0234179 is the first bad commit
commit 90cb088ece8d8cc1019d25629d1585e5b0234179
Author: konglin1 
Date:   Wed Nov 10 09:37:32 2021 +0800

i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode 
with -mf16c [PR 102811]

caused

FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -O0 
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -Og -g 
FAIL: gcc.c-torture/execute/builtins/memmove-chk.c compilation,  -O0 
FAIL: gcc.c-torture/execute/builtins/memmove-chk.c compilation,  -Og -g 
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c compilation,  -O0 
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c compilation,  -Og -g 
FAIL: gcc.dg/guality/vla-1.c   -O0  (test for excess errors)
FAIL: gcc.dg/guality/vla-1.c   -O1  -DPREVENT_OPTIMIZATION  (test for excess 
errors)
FAIL: gcc.dg/guality/vla-1.c   -O2  -DPREVENT_OPTIMIZATION  (test for excess 
errors)
FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION (test for excess errors)
FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION (test for excess errors)
FAIL: gcc.dg/guality/vla-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  (test for excess 
errors)
FAIL: gcc.dg/guality/vla-1.c  -Og -DPREVENT_OPTIMIZATION  (test for excess 
errors)
FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation,  -O1 
FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation,  -O2  
(internal compiler error)
FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation,  -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
(internal compiler error)
FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation,  -O3 -g  
(internal compiler error)
FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation,  -Og -g 
FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation,  -Os  
(internal compiler error)
FAIL: gfortran.dg/fmt_cache_1.f   -O0  (test for excess errors)
FAIL: gfortran.dg/fmt_cache_1.f   -O1  (test for excess errors)
FAIL: gfortran.dg/fmt_cache_1.f   -O2  (test for excess errors)
FAIL: gfortran.dg/fmt_cache_1.f   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/fmt_cache_1.f   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/fmt_cache_1.f   -Os  (test for excess errors)
FAIL: gfortran.dg/g77/7388.f   -O0  (test for excess errors)
FAIL: gfortran.dg/iomsg_1.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/namelist_19.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/read_no_eor.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/round_4.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/widechar_2.f90   -O0  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5536/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="builtins.exp=gcc.c-torture/execute/builtins/memcpy-chk.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="builtins.exp=gcc.c-torture/execute/builtins/memmove-chk.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="builtins.exp=gcc.c-torture/execute/builtins/mempcpy-chk.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/vla-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="abi-x86_64.exp=gcc.target/x86_64/abi/test_struct_returning.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/fmt_cache_1.f --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/g77/7388.f 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/iomsg_1.f90 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/namelist_19.f90 --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/read_no_eor.f90 --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/round_4.f90 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/widechar_2.f90 --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at 

Re: [PATCH v3 0/8] __builtin_dynamic_object_size

2021-11-25 Thread Siddhesh Poyarekar

On 11/26/21 10:58, Siddhesh Poyarekar wrote:

sure it works) and saw no issues in any of those builds.  I did some
rudimentary analysis of the generated binaries using fortify-metrics[1]
to confirm that there was a difference in coverage between the two
fortification levels.

Here is a summary of coverage in the above packages:

F = number of fortified calls
T = Total number of calls to fortifiable functions (fortified as well as
unfortified)
C = F * 100/ T

Package F(2)T(2)F(3)T(3)C(2)C(3)
bash428 12201005119635.08%  84.03%
wpa_supplicant  163532322350340850.59%  68.96%
systemtap   324 1990343 199416.28%  17.20%
cmake   830 14181   958 14196   5.85%   6.75%

The numbers are slightly lower than the previous patch series because in
the interim I pushed an improvement to folding of the _chk builtins so
that they can use ranges to simplify the calls to their regular
variants.  Also note that even _FORTIFY_SOURCE=2 coverage should be
improved due to negative offset handling.


[1] https://github.com/siddhesh/fortify-metrics


[PATCH v3 8/8] tree-object-size: Dynamic sizes for ADDR_EXPR

2021-11-25 Thread Siddhesh Poyarekar
Allow returning dynamic expressions from ADDR_EXPR for
__builtin_dynamic_object_size and also allow offsets to be dynamic.

gcc/ChangeLog:

* tree-object-size.c (size_valid_p): New function.
(size_for_offset): Remove OFFSET constness assertion.
(addr_object_size): Build dynamic expressions for object
sizes and use size_valid_p to decide if it is valid for the
given OBJECT_SIZE_TYPE.
(compute_builtin_object_size): Allow dynamic offsets when
computing size at O0.
(call_object_size): Call size_valid_p.
(plus_stmt_object_size): Allow non-constant offset and use
size_valid_p to decide if it is valid for the given
OBJECT_SIZE_TYPE.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c: Add new tests.
* gcc.dg/builtin-object-size-1.c (test1)
[__builtin_object_size]: Adjust expected output for dynamic
object sizes.
* gcc.dg/builtin-object-size-2.c (test1)
[__builtin_object_size]: Likewise.
* gcc.dg/builtin-object-size-3.c (test1)
[__builtin_object_size]: Likewise.
* gcc.dg/builtin-object-size-4.c (test1)
[__builtin_object_size]: Likewise.

Signed-off-by: Siddhesh Poyarekar 
---
 .../gcc.dg/builtin-dynamic-object-size-0.c| 158 ++
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  |  30 +++-
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  43 -
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  |  25 ++-
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  |  17 +-
 gcc/tree-object-size.c|  91 +-
 6 files changed, 300 insertions(+), 64 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 2db0e0d1aa2..4a1f4965ebd 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -219,6 +219,79 @@ test_deploop (size_t sz, size_t cond)
   return __builtin_dynamic_object_size (bin, 0);
 }
 
+/* Address expressions.  */
+
+struct dynarray_struct
+{
+  long a;
+  char c[16];
+  int b;
+};
+
+size_t
+__attribute__ ((noinline))
+test_dynarray_struct (size_t sz, size_t off)
+{
+  struct dynarray_struct bin[sz];
+
+  return __builtin_dynamic_object_size ([off].c, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_dynarray_struct_subobj (size_t sz, size_t off)
+{
+  struct dynarray_struct bin[sz];
+
+  return __builtin_dynamic_object_size ([off].c[4], 1);
+}
+
+size_t
+__attribute__ ((noinline))
+test_dynarray_struct_subobj2 (size_t sz, size_t off, size_t *objsz)
+{
+  struct dynarray_struct2
+{
+  long a;
+  int b;
+  char c[sz];
+};
+
+  struct dynarray_struct2 bin;
+
+  *objsz = sizeof (bin);
+
+  return __builtin_dynamic_object_size ([off], 1);
+}
+
+size_t
+__attribute__ ((noinline))
+test_substring (size_t sz, size_t off)
+{
+  char str[sz];
+
+  return __builtin_dynamic_object_size ([off], 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_substring_ptrplus (size_t sz, size_t off)
+{
+  int str[sz];
+
+  return __builtin_dynamic_object_size (str + off, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_substring_ptrplus2 (size_t sz, size_t off, size_t off2)
+{
+  int str[sz];
+  int *ptr = [off];
+
+  return __builtin_dynamic_object_size (ptr + off2, 0);
+}
+
 size_t
 __attribute__ ((access (__read_write__, 1, 2)))
 __attribute__ ((noinline))
@@ -227,6 +300,40 @@ test_parmsz_simple (void *obj, size_t sz)
   return __builtin_dynamic_object_size (obj, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+__attribute__ ((access (__read_write__, 1, 2)))
+test_parmsz (void *obj, size_t sz, size_t off)
+{
+  return __builtin_dynamic_object_size (obj + off, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+__attribute__ ((access (__read_write__, 1, 2)))
+test_parmsz_scale (int *obj, size_t sz, size_t off)
+{
+  return __builtin_dynamic_object_size (obj + off, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+__attribute__ ((access (__read_write__, 1, 2)))
+test_loop (int *obj, size_t sz, size_t start, size_t end, int incr)
+{
+  int *ptr = obj + start;
+
+  for (int i = start; i != end; i = i + incr)
+{
+  ptr = ptr + incr;
+  if (__builtin_dynamic_object_size (ptr, 0) == 0)
+   return 0;
+}
+
+  return __builtin_dynamic_object_size (ptr, 0);
+}
+
+
 unsigned nfails = 0;
 
 #define FAIL() ({ \
@@ -287,6 +394,31 @@ main (int argc, char **argv)
 FAIL ();
   if (test_dynarray (__builtin_strlen (argv[0])) != __builtin_strlen (argv[0]))
 FAIL ();
+  if (test_dynarray_struct (42, 4) !=
+  ((42 - 4) * sizeof (struct dynarray_struct)
+   - __builtin_offsetof (struct dynarray_struct, c)))
+FAIL ();
+  if (test_dynarray_struct (42, 48) != 0)
+FAIL ();
+  if (test_substring (128, 4) != 128 - 4)
+FAIL ();
+  if (test_substring (128, 142) != 0)
+FAIL ();
+  if (test_dynarray_struct_subobj (42, 4) != 16 - 4)
+

[PATCH v3 7/8] tree-object-size: Handle GIMPLE_CALL

2021-11-25 Thread Siddhesh Poyarekar
Handle non-constant expressions in GIMPLE_CALL arguments.  Also handle
alloca.

gcc/ChangeLog:

* tree-object-size.c (alloc_object_size): Make and return
non-constant size expression.
(call_object_size): Return expression or unknown based on
whether dynamic object size is requested.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c: Add new tests.
* gcc.dg/builtin-object-size-1.c (test1)
[__builtin_object_size]: Alter expected result for dynamic
object size.
* gcc.dg/builtin-object-size-2.c (test1)
[__builtin_object_size]: Likewise.
* gcc.dg/builtin-object-size-3.c (test1)
[__builtin_object_size]: Likewise.
* gcc.dg/builtin-object-size-4.c (test1)
[__builtin_object_size]: Likewise.

Signed-off-by: Siddhesh Poyarekar 
---
 .../gcc.dg/builtin-dynamic-object-size-0.c| 227 +-
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  |   7 +
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  14 ++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  |   7 +
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  |  14 ++
 gcc/tree-object-size.c|  22 +-
 6 files changed, 282 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index ce0f4eb17f3..2db0e0d1aa2 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -4,12 +4,71 @@
 typedef __SIZE_TYPE__ size_t;
 #define abort __builtin_abort
 
+void *
+__attribute__ ((alloc_size (1)))
+__attribute__ ((__nothrow__ , __leaf__))
+__attribute__ ((noinline))
+alloc_func (size_t sz)
+{
+  return __builtin_malloc (sz);
+}
+
+void *
+__attribute__ ((alloc_size (1, 2)))
+__attribute__ ((__nothrow__ , __leaf__))
+__attribute__ ((noinline))
+calloc_func (size_t cnt, size_t sz)
+{
+  return __builtin_calloc (cnt, sz);
+}
+
+void *
+__attribute__ ((noinline))
+unknown_allocator (size_t cnt, size_t sz)
+{
+  return __builtin_calloc (cnt, sz);
+}
+
+size_t
+__attribute__ ((noinline))
+test_unknown (size_t cnt, size_t sz)
+{
+  void *ret = unknown_allocator (cnt, sz);
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+/* Malloc-like allocator.  */
+
+size_t
+__attribute__ ((noinline))
+test_malloc (size_t sz)
+{
+  void *ret = alloc_func (sz);
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc (size_t sz)
+{
+  void *ret = __builtin_malloc (sz);
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_cond (int cond)
+{
+  void *ret = __builtin_malloc (cond ? 32 : 64);
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
 size_t
 __attribute__ ((noinline))
 test_builtin_malloc_condphi (int cond)
 {
   void *ret;
- 
+
   if (cond)
 ret = __builtin_malloc (32);
   else
@@ -18,6 +77,79 @@ test_builtin_malloc_condphi (int cond)
   return __builtin_dynamic_object_size (ret, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi2 (int cond, size_t in)
+{
+  void *ret;
+
+  if (cond)
+ret = __builtin_malloc (in);
+  else
+ret = __builtin_malloc (64);
+
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi3 (int cond, size_t in, size_t in2)
+{
+  void *ret;
+
+  if (cond)
+ret = __builtin_malloc (in);
+  else
+ret = __builtin_malloc (in2);
+
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi4 (size_t sz, int cond)
+{
+  char *a = __builtin_malloc (sz);
+  char b[sz / 2];
+
+  return __builtin_dynamic_object_size (cond ? b : (void *) , 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi5 (size_t sz, int cond, char *c)
+{
+  char *a = __builtin_malloc (sz);
+
+  return __builtin_dynamic_object_size (cond ? c : (void *) , 0);
+}
+
+/* Calloc-like allocator.  */
+
+size_t
+__attribute__ ((noinline))
+test_calloc (size_t cnt, size_t sz)
+{
+  void *ret = calloc_func (cnt, sz);
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_calloc (size_t cnt, size_t sz)
+{
+  void *ret = __builtin_calloc (cnt, sz);
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_calloc_cond (int cond1, int cond2)
+{
+  void *ret = __builtin_calloc (cond1 ? 32 : 64, cond2 ? 1024 : 16);
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
 size_t
 __attribute__ ((noinline))
 test_builtin_calloc_condphi (size_t cnt, size_t sz, int cond)
@@ -33,6 +165,47 @@ test_builtin_calloc_condphi (size_t cnt, size_t sz, int 
cond)
   return __builtin_dynamic_object_size (cond ? ch : (void *) , 0);
 }
 
+/* Passthrough functions.  */
+
+size_t
+__attribute__ ((noinline))
+test_passthrough (size_t sz, 

[PATCH v3 5/8] tree-object-size: Support dynamic sizes in conditions

2021-11-25 Thread Siddhesh Poyarekar
Handle GIMPLE_PHI and conditionals specially for dynamic objects,
returning PHI/conditional expressions instead of just a MIN/MAX
estimate.

This makes the returned object size variable for loops and conditionals,
so tests need to be adjusted to look for precise size in some cases.
builtin-dynamic-object-size-5.c had to be modified to only look for
success in maximum object size case and skip over the minimum object
size tests because the result is no longer a compile time constant.

I also added some simple tests to exercise conditionals with dynamic
object sizes.

gcc/ChangeLog:

* builtins.c (fold_builtin_object_size): Adjust for dynamic size
expressions.
* tree-object-size.c: Include gimplify-me.h.
(struct object_size_info): New member UNKNOWNS.
(size_initval_p, object_sizes_get_raw): New functions.
(object_sizes_get): Return suitable gimple variable for
object size.
(object_sizes_initialize): Reuse existing object size TREE_VEC
during gimplification.
(bundle_sizes): New function.
(object_sizes_set): Use it and handle dynamic object size
expressions.
(object_sizes_set_temp): New function.
(size_for_offset): Adjust for dynamic size expressions.
(emit_phi_nodes, propagate_unknowns, gimplify_size_expressions):
New functions.
(compute_builtin_object_size): Call gimplify_size_expressions
for OST_DYNAMIC.
(dynamic_object_size): New function.
(cond_expr_object_size): Use it.
(phi_dynamic_object_size): New function.
(collect_object_sizes_for): Call it for OST_DYNAMIC.  Adjust to
accommodate dynamic object sizes.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c: New tests.
* gcc.dg/builtin-dynamic-object-size-10.c: Add comment.
* gcc.dg/builtin-object-size-1.c [__builtin_object_size]: Expect
exact size expressions for __builtin_dynamic_object_size.
* gcc.dg/builtin-object-size-2.c [__builtin_object_size]:
Likewise.
* gcc.dg/builtin-object-size-3.c [__builtin_object_size]:
Likewise.
* gcc.dg/builtin-object-size-4.c [__builtin_object_size]:
Likewise.
* gcc.dg/builtin-object-size-5.c [__builtin_object_size]:
Likewise.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v2:

- Incorporated review suggestions.
- Delay generating PHI nodes until gimplification so that it doesn't
  have to be undone if it was found to be unknown.
- Adapt to retaining the multipass approach for static object sizes.

 gcc/builtins.c|   6 +-
 .../gcc.dg/builtin-dynamic-object-size-0.c|  72 +++
 .../gcc.dg/builtin-dynamic-object-size-10.c   |   2 +
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  | 119 -
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  92 
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  | 121 +
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  |  78 +++
 gcc/testsuite/gcc.dg/builtin-object-size-5.c  |  12 +
 gcc/tree-object-size.c| 494 +-
 9 files changed, 966 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 573f7e9b9df..9770e13353d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -10256,7 +10256,8 @@ fold_builtin_object_size (tree ptr, tree ost, enum 
built_in_function fcode)
   if (TREE_CODE (ptr) == ADDR_EXPR)
 {
   compute_builtin_object_size (ptr, object_size_type, );
-  if (int_fits_type_p (bytes, size_type_node))
+  if ((object_size_type & OST_DYNAMIC)
+ || int_fits_type_p (bytes, size_type_node))
return fold_convert (size_type_node, bytes);
 }
   else if (TREE_CODE (ptr) == SSA_NAME)
@@ -10265,7 +10266,8 @@ fold_builtin_object_size (tree ptr, tree ost, enum 
built_in_function fcode)
later.  Maybe subsequent passes will help determining
it.  */
   if (compute_builtin_object_size (ptr, object_size_type, )
- && int_fits_type_p (bytes, size_type_node))
+ && ((object_size_type & OST_DYNAMIC)
+ || int_fits_type_p (bytes, size_type_node)))
return fold_convert (size_type_node, bytes);
 }
 
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
new file mode 100644
index 000..ddedf6a49bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+typedef __SIZE_TYPE__ size_t;
+#define abort __builtin_abort
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi (int cond)
+{
+  void *ret;
+ 
+  if (cond)
+ret = __builtin_malloc (32);
+  else
+ret = __builtin_malloc (64);
+
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))

[PATCH v3 6/8] tree-object-size: Handle function parameters

2021-11-25 Thread Siddhesh Poyarekar
Handle hints provided by __attribute__ ((access (...))) to compute
dynamic sizes for objects.

gcc/ChangeLog:

* tree-object-size.c: Include tree-dfa.h.
(parm_object_size): New function.
(collect_object_sizes_for): Call it.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple):
New function.
(main): Call it.

Signed-off-by: Siddhesh Poyarekar 
---
 .../gcc.dg/builtin-dynamic-object-size-0.c| 11 
 gcc/tree-object-size.c| 50 ++-
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index ddedf6a49bd..ce0f4eb17f3 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -46,6 +46,14 @@ test_deploop (size_t sz, size_t cond)
   return __builtin_dynamic_object_size (bin, 0);
 }
 
+size_t
+__attribute__ ((access (__read_write__, 1, 2)))
+__attribute__ ((noinline))
+test_parmsz_simple (void *obj, size_t sz)
+{
+  return __builtin_dynamic_object_size (obj, 0);
+}
+
 unsigned nfails = 0;
 
 #define FAIL() ({ \
@@ -64,6 +72,9 @@ main (int argc, char **argv)
 FAIL ();
   if (test_deploop (128, 129) != 32)
 FAIL ();
+  if (test_parmsz_simple (argv[0], __builtin_strlen (argv[0]) + 1)
+  != __builtin_strlen (argv[0]) + 1)
+FAIL ();
 
   if (nfails > 0)
 __builtin_abort ();
diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 5b4dcb619cd..48b1ec6e26a 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimple-iterator.h"
 #include "tree-cfg.h"
+#include "tree-dfa.h"
 #include "stringpool.h"
 #include "attribs.h"
 #include "builtins.h"
@@ -1446,6 +1447,53 @@ cond_expr_object_size (struct object_size_info *osi, 
tree var, gimple *stmt)
   return reexamine;
 }
 
+/* Find size of an object passed as a parameter to the function.  */
+
+static void
+parm_object_size (struct object_size_info *osi, tree var)
+{
+  int object_size_type = osi->object_size_type;
+  tree parm = SSA_NAME_VAR (var);
+
+  if (!(object_size_type & OST_DYNAMIC) || !POINTER_TYPE_P (TREE_TYPE (parm)))
+expr_object_size (osi, var, parm);
+
+  /* Look for access attribute.  */
+  rdwr_map rdwr_idx;
+
+  tree fndecl = cfun->decl;
+  const attr_access *access = get_parm_access (rdwr_idx, parm, fndecl);
+  tree typesize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (parm)));
+  tree sz = NULL_TREE;
+
+  if (access && access->sizarg != UINT_MAX)
+{
+  tree fnargs = DECL_ARGUMENTS (fndecl);
+  tree arg = NULL_TREE;
+  unsigned argpos = 0;
+
+  /* Walk through the parameters to pick the size parameter and safely
+scale it by the type size.  */
+  for (arg = fnargs; argpos != access->sizarg && arg;
+  arg = TREE_CHAIN (arg), ++argpos);
+
+  if (arg != NULL_TREE && INTEGRAL_TYPE_P (TREE_TYPE (arg)))
+   {
+ sz = get_or_create_ssa_default_def (cfun, arg);
+ if (sz != NULL_TREE)
+   {
+ sz = fold_convert (sizetype, sz);
+ if (typesize)
+   sz = size_binop (MULT_EXPR, sz, typesize);
+   }
+   }
+}
+  if (!sz)
+sz = size_unknown (object_size_type);
+
+  object_sizes_set (osi, SSA_NAME_VERSION (var), sz, sz);
+}
+
 /* Compute an object size expression for VAR, which is the result of a PHI
node.  */
 
@@ -1603,7 +1651,7 @@ collect_object_sizes_for (struct object_size_info *osi, 
tree var)
 case GIMPLE_NOP:
   if (SSA_NAME_VAR (var)
  && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL)
-   expr_object_size (osi, var, SSA_NAME_VAR (var));
+   parm_object_size (osi, var);
   else
/* Uninitialized SSA names point nowhere.  */
unknown_object_size (osi, var);
-- 
2.31.1



[PATCH v3 4/8] __builtin_dynamic_object_size: Recognize builtin

2021-11-25 Thread Siddhesh Poyarekar
Recognize the __builtin_dynamic_object_size builtin and add paths in the
object size path to deal with it, but treat it like
__builtin_object_size for now.  Also add tests to provide the same
testing coverage for the new builtin name.

gcc/ChangeLog:

* builtins.def (BUILT_IN_DYNAMIC_OBJECT_SIZE): New builtin.
* tree-object-size.h: Move object size type bits enum from
tree-object-size.c and add new value OST_DYNAMIC.
* builtins.c (expand_builtin, fold_builtin_2): Handle it.
(fold_builtin_object_size): Handle new builtin and adjust for
change to compute_builtin_object_size.
* tree-object-size.c: Include builtins.h.
(compute_builtin_object_size): Adjust.
(early_object_sizes_execute_one,
dynamic_object_sizes_execute_one): New functions.
(object_sizes_execute): Rename insert_min_max_p argument to
early.  Handle BUILT_IN_DYNAMIC_OBJECT_SIZE and call the new
functions.
doc/extend.texi (__builtin_dynamic_object_size): Document new
builtin.

gcc/testsuite/ChangeLog:

* g++.dg/ext/builtin-dynamic-object-size1.C: New test.
* g++.dg/ext/builtin-dynamic-object-size2.C: Likewise.
* gcc.dg/builtin-dynamic-alloc-size.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-1.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-10.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-11.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-12.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-13.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-14.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-15.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-16.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-17.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-18.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-19.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-5.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-6.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-7.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-8.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-9.c: Likewise.
* gcc.dg/builtin-object-size-16.c: Adjust to allow inclusion
from builtin-dynamic-object-size-16.c.
* gcc.dg/builtin-object-size-17.c: Likewise.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v2:

- Incorporated review suggestions.

 gcc/builtins.c|  11 +-
 gcc/builtins.def  |   1 +
 gcc/doc/extend.texi   |  13 ++
 .../g++.dg/ext/builtin-dynamic-object-size1.C |   5 +
 .../g++.dg/ext/builtin-dynamic-object-size2.C |   5 +
 .../gcc.dg/builtin-dynamic-alloc-size.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-1.c|   6 +
 .../gcc.dg/builtin-dynamic-object-size-10.c   |   9 ++
 .../gcc.dg/builtin-dynamic-object-size-11.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-12.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-13.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-14.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-15.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-16.c   |   6 +
 .../gcc.dg/builtin-dynamic-object-size-17.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-18.c   |   8 +
 .../gcc.dg/builtin-dynamic-object-size-19.c   | 104 
 .../gcc.dg/builtin-dynamic-object-size-2.c|   6 +
 .../gcc.dg/builtin-dynamic-object-size-3.c|   6 +
 .../gcc.dg/builtin-dynamic-object-size-4.c|   6 +
 .../gcc.dg/builtin-dynamic-object-size-5.c|   7 +
 .../gcc.dg/builtin-dynamic-object-size-6.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-7.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-8.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-9.c|   5 +
 gcc/testsuite/gcc.dg/builtin-object-size-16.c |   2 +
 gcc/testsuite/gcc.dg/builtin-object-size-17.c |   2 +
 gcc/tree-object-size.c| 152 +-
 gcc/tree-object-size.h|  10 ++
 29 files changed, 378 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c
 create mode 100644 

[PATCH v3 3/8] tree-object-size: Save sizes as trees and support negative offsets

2021-11-25 Thread Siddhesh Poyarekar
Transform tree-object-size to operate on tree objects instead of host
wide integers.  This makes it easier to extend to dynamic expressions
for object sizes.

The compute_builtin_object_size interface also now returns a tree
expression instead of HOST_WIDE_INT, so callers have been adjusted to
account for that.

The trees in object_sizes are each a TREE_VEC with the first element
being the bytes from the pointer to the end of the object and the
second, the size of the whole object.  This allows analysis of negative
offsets, which can now be allowed to the extent of the object bounds.
Tests have been added to verify that it actually works.

gcc/ChangeLog:

* tree-object-size.h (compute_builtin_object_size): Return tree
instead of HOST_WIDE_INT.
* builtins.c (fold_builtin_object_size): Adjust.
* gimple-fold.c (gimple_fold_builtin_strncat): Likewise.
* ubsan.c (instrument_object_size): Likewise.
* tree-object-size.c (object_sizes): Change type to vec.
(initval): New function.
(unknown): Use it.
(size_unknown_p, size_initval, size_unknown): New functions.
(object_sizes_unknown_p): Use it.
(object_sizes_get): Return tree.
(object_sizes_initialize): Rename from object_sizes_set_force
and set VAL parameter type as tree.  Add new parameter WHOLEVAL.
(object_sizes_set): Set VAL parameter type as tree and adjust
implementation.  Add new parameter WHOLEVAL.
(size_for_offset): New function.
(decl_init_size): Adjust comment.
(addr_object_size): Change PSIZE parameter to tree and adjust
implementation.  Add new parameter PWHOLESIZE.
(alloc_object_size): Return tree.
(compute_builtin_object_size): Return tree in PSIZE.
(expr_object_size, call_object_size, unknown_object_size):
Adjust for object_sizes_set change.
(merge_object_sizes): Drop OFFSET parameter and adjust
implementation for tree change.
(plus_stmt_object_size): Call collect_object_sizes_for directly
instead of merge_object_size and call size_for_offset to get net
size.
(cond_expr_object_size, collect_object_sizes_for,
object_sizes_execute): Adjust for change of type from
HOST_WIDE_INT to tree.
(check_for_plus_in_loops_1): Likewise and skip non-positive
offsets.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-1.c (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c (test8): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c (test8): New test.
(main): Call it.
* gcc.dg/builtin-object-size-5.c (test5, test6, test7): New
tests.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v2:

- Incorporated review suggestions.
- Added support for negative offsets.

 gcc/builtins.c   |  10 +-
 gcc/gimple-fold.c|  11 +-
 gcc/testsuite/gcc.dg/builtin-object-size-1.c |  30 ++
 gcc/testsuite/gcc.dg/builtin-object-size-2.c |  30 ++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c |  31 ++
 gcc/testsuite/gcc.dg/builtin-object-size-4.c |  30 ++
 gcc/testsuite/gcc.dg/builtin-object-size-5.c |  25 ++
 gcc/tree-object-size.c   | 388 ---
 gcc/tree-object-size.h   |   2 +-
 gcc/ubsan.c  |   5 +-
 10 files changed, 403 insertions(+), 159 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 384864bfb3a..50e66692775 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -10226,7 +10226,7 @@ maybe_emit_sprintf_chk_warning (tree exp, enum 
built_in_function fcode)
 static tree
 fold_builtin_object_size (tree ptr, tree ost)
 {
-  unsigned HOST_WIDE_INT bytes;
+  tree bytes;
   int object_size_type;
 
   if (!validate_arg (ptr, POINTER_TYPE)
@@ -10251,8 +10251,8 @@ fold_builtin_object_size (tree ptr, tree ost)
   if (TREE_CODE (ptr) == ADDR_EXPR)
 {
   compute_builtin_object_size (ptr, object_size_type, );
-  if (wi::fits_to_tree_p (bytes, size_type_node))
-   return build_int_cstu (size_type_node, bytes);
+  if (int_fits_type_p (bytes, size_type_node))
+   return fold_convert (size_type_node, bytes);
 }
   else if (TREE_CODE (ptr) == SSA_NAME)
 {
@@ -10260,8 +10260,8 @@ fold_builtin_object_size (tree ptr, tree ost)
later.  Maybe subsequent passes will help determining
it.  */
   if (compute_builtin_object_size (ptr, object_size_type, )
- && wi::fits_to_tree_p (bytes, size_type_node))
-   return build_int_cstu (size_type_node, bytes);
+ && int_fits_type_p (bytes, size_type_node))
+   return fold_convert (size_type_node, bytes);
 }
 
   return NULL_TREE;
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 

[PATCH v3 2/8] tree-object-size: Abstract object_sizes array

2021-11-25 Thread Siddhesh Poyarekar
Put all accesses to object_sizes behind functions so that we can add
dynamic capability more easily.

gcc/ChangeLog:

* tree-object-size.c (object_sizes_grow, object_sizes_release,
object_sizes_unknown_p, object_sizes_get, object_size_set_force,
object_sizes_set): New functions.
(addr_object_size, compute_builtin_object_size,
expr_object_size, call_object_size, unknown_object_size,
merge_object_sizes, plus_stmt_object_size,
cond_expr_object_size, collect_object_sizes_for,
check_for_plus_in_loops_1, init_object_sizes,
fini_object_sizes): Adjust.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v2:

- Incorporated review suggestions.

 gcc/tree-object-size.c | 177 +++--
 1 file changed, 98 insertions(+), 79 deletions(-)

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 5e93bb74f92..3780437ff91 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -88,6 +88,71 @@ unknown (int object_size_type)
   return ((unsigned HOST_WIDE_INT) -((object_size_type >> 1) ^ 1));
 }
 
+/* Grow object_sizes[OBJECT_SIZE_TYPE] to num_ssa_names.  */
+
+static inline void
+object_sizes_grow (int object_size_type)
+{
+  if (num_ssa_names > object_sizes[object_size_type].length ())
+object_sizes[object_size_type].safe_grow (num_ssa_names, true);
+}
+
+/* Release object_sizes[OBJECT_SIZE_TYPE].  */
+
+static inline void
+object_sizes_release (int object_size_type)
+{
+  object_sizes[object_size_type].release ();
+}
+
+/* Return true if object_sizes[OBJECT_SIZE_TYPE][VARNO] is unknown.  */
+
+static inline bool
+object_sizes_unknown_p (int object_size_type, unsigned varno)
+{
+  return (object_sizes[object_size_type][varno]
+ == unknown (object_size_type));
+}
+
+/* Return size for VARNO corresponding to OSI.  */
+
+static inline unsigned HOST_WIDE_INT
+object_sizes_get (struct object_size_info *osi, unsigned varno)
+{
+  return object_sizes[osi->object_size_type][varno];
+}
+
+/* Set size for VARNO corresponding to OSI to VAL.  */
+
+static inline bool
+object_sizes_set_force (struct object_size_info *osi, unsigned varno,
+   unsigned HOST_WIDE_INT val)
+{
+  object_sizes[osi->object_size_type][varno] = val;
+  return true;
+}
+
+/* Set size for VARNO corresponding to OSI to VAL if it is the new minimum or
+   maximum.  */
+
+static inline bool
+object_sizes_set (struct object_size_info *osi, unsigned varno,
+ unsigned HOST_WIDE_INT val)
+{
+  int object_size_type = osi->object_size_type;
+  if ((object_size_type & OST_MINIMUM) == 0)
+{
+  if (object_sizes[object_size_type][varno] < val)
+   return object_sizes_set_force (osi, varno, val);
+}
+  else
+{
+  if (object_sizes[object_size_type][varno] > val)
+   return object_sizes_set_force (osi, varno, val);
+}
+  return false;
+}
+
 /* Initialize OFFSET_LIMIT variable.  */
 static void
 init_offset_limit (void)
@@ -247,7 +312,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
collect_object_sizes_for (osi, var);
  if (bitmap_bit_p (computed[object_size_type],
SSA_NAME_VERSION (var)))
-   sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)];
+   sz = object_sizes_get (osi, SSA_NAME_VERSION (var));
  else
sz = unknown (object_size_type);
}
@@ -582,14 +647,14 @@ compute_builtin_object_size (tree ptr, int 
object_size_type,
   return false;
 }
 
+  struct object_size_info osi;
+  osi.object_size_type = object_size_type;
   if (!bitmap_bit_p (computed[object_size_type], SSA_NAME_VERSION (ptr)))
 {
-  struct object_size_info osi;
   bitmap_iterator bi;
   unsigned int i;
 
-  if (num_ssa_names > object_sizes[object_size_type].length ())
-   object_sizes[object_size_type].safe_grow (num_ssa_names, true);
+  object_sizes_grow (object_size_type);
   if (dump_file)
{
  fprintf (dump_file, "Computing %s %sobject size for ",
@@ -601,7 +666,6 @@ compute_builtin_object_size (tree ptr, int object_size_type,
 
   osi.visited = BITMAP_ALLOC (NULL);
   osi.reexamine = BITMAP_ALLOC (NULL);
-  osi.object_size_type = object_size_type;
   osi.depths = NULL;
   osi.stack = NULL;
   osi.tos = NULL;
@@ -678,8 +742,7 @@ compute_builtin_object_size (tree ptr, int object_size_type,
   if (dump_file)
{
  EXECUTE_IF_SET_IN_BITMAP (osi.visited, 0, i, bi)
-   if (object_sizes[object_size_type][i]
-   != unknown (object_size_type))
+   if (!object_sizes_unknown_p (object_size_type, i))
  {
print_generic_expr (dump_file, ssa_name (i),
dump_flags);
@@ -689,7 +752,7 @@ compute_builtin_object_size (tree ptr, int object_size_type,
 ((object_size_type & OST_MINIMUM) ? 

[PATCH v3 1/8] tree-object-size: Replace magic numbers with enums

2021-11-25 Thread Siddhesh Poyarekar
A simple cleanup to allow inserting dynamic size code more easily.

gcc/ChangeLog:

* tree-object-size.c: New enum.
(object_sizes, computed, addr_object_size,
compute_builtin_object_size, expr_object_size, call_object_size,
merge_object_sizes, plus_stmt_object_size,
collect_object_sizes_for, init_object_sizes, fini_object_sizes,
object_sizes_execute): Replace magic numbers with enums.

Signed-off-by: Siddhesh Poyarekar 
---

Changes from v2:

- Incorporated review suggestions.

 gcc/tree-object-size.c | 59 --
 1 file changed, 34 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 4334e05ef70..5e93bb74f92 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -45,6 +45,13 @@ struct object_size_info
   unsigned int *stack, *tos;
 };
 
+enum
+{
+  OST_SUBOBJECT = 1,
+  OST_MINIMUM = 2,
+  OST_END = 4,
+};
+
 static tree compute_object_offset (const_tree, const_tree);
 static bool addr_object_size (struct object_size_info *,
  const_tree, int, unsigned HOST_WIDE_INT *);
@@ -67,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct 
object_size_info *, tree,
the subobject (innermost array or field with address taken).
object_sizes[2] is lower bound for number of bytes till the end of
the object and object_sizes[3] lower bound for subobject.  */
-static vec object_sizes[4];
+static vec object_sizes[OST_END];
 
 /* Bitmaps what object sizes have been computed already.  */
-static bitmap computed[4];
+static bitmap computed[OST_END];
 
 /* Maximum value of offset we consider to be addition.  */
 static unsigned HOST_WIDE_INT offset_limit;
@@ -227,11 +234,11 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
 {
   unsigned HOST_WIDE_INT sz;
 
-  if (!osi || (object_size_type & 1) != 0
+  if (!osi || (object_size_type & OST_SUBOBJECT) != 0
  || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
{
  compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
-  object_size_type & ~1, );
+  object_size_type & ~OST_SUBOBJECT, );
}
   else
{
@@ -266,7 +273,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 }
   else if (DECL_P (pt_var))
 {
-  pt_var_size = decl_init_size (pt_var, object_size_type & 2);
+  pt_var_size = decl_init_size (pt_var, object_size_type & OST_MINIMUM);
   if (!pt_var_size)
return false;
 }
@@ -287,7 +294,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 {
   tree var;
 
-  if (object_size_type & 1)
+  if (object_size_type & OST_SUBOBJECT)
{
  var = TREE_OPERAND (ptr, 0);
 
@@ -528,7 +535,7 @@ bool
 compute_builtin_object_size (tree ptr, int object_size_type,
 unsigned HOST_WIDE_INT *psize)
 {
-  gcc_assert (object_size_type >= 0 && object_size_type <= 3);
+  gcc_assert (object_size_type >= 0 && object_size_type < OST_END);
 
   /* Set to unknown and overwrite just before returning if the size
  could be determined.  */
@@ -546,7 +553,7 @@ compute_builtin_object_size (tree ptr, int object_size_type,
 
   if (computed[object_size_type] == NULL)
 {
-  if (optimize || object_size_type & 1)
+  if (optimize || object_size_type & OST_SUBOBJECT)
return false;
 
   /* When not optimizing, rather than failing, make a small effort
@@ -586,8 +593,8 @@ compute_builtin_object_size (tree ptr, int object_size_type,
   if (dump_file)
{
  fprintf (dump_file, "Computing %s %sobject size for ",
-  (object_size_type & 2) ? "minimum" : "maximum",
-  (object_size_type & 1) ? "sub" : "");
+  (object_size_type & OST_MINIMUM) ? "minimum" : "maximum",
+  (object_size_type & OST_SUBOBJECT) ? "sub" : "");
  print_generic_expr (dump_file, ptr, dump_flags);
  fprintf (dump_file, ":\n");
}
@@ -620,7 +627,7 @@ compute_builtin_object_size (tree ptr, int object_size_type,
 terminate, it could take a long time.  If a pointer is
 increasing this way, we need to assume 0 object size.
 E.g. p = [0]; while (cond) p = p + 4;  */
- if (object_size_type & 2)
+ if (object_size_type & OST_MINIMUM)
{
  osi.depths = XCNEWVEC (unsigned int, num_ssa_names);
  osi.stack = XNEWVEC (unsigned int, num_ssa_names);
@@ -679,8 +686,9 @@ compute_builtin_object_size (tree ptr, int object_size_type,
fprintf (dump_file,
 ": %s %sobject size "
 HOST_WIDE_INT_PRINT_UNSIGNED "\n",
-(object_size_type & 2) ? "minimum" : "maximum",
-(object_size_type & 1) ? "sub" : "",
+

[PATCH v3 0/8] __builtin_dynamic_object_size

2021-11-25 Thread Siddhesh Poyarekar
This patchset implements the __builtin_dynamic_object_size builtin for
gcc.  The primary motivation to have this builtin in gcc is to enable
_FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
in use cases where the potential performance tradeoff is acceptable.

Semantics:
--

__builtin_dynamic_object_size has the same signature as
__builtin_object_size; it accepts a pointer and type ranging from 0 to 3
and it returns an object size estimate for the pointer based on an
analysis of which objects the pointer could point to.  The actual
properties of the object size estimate are different:

- In the best case __builtin_dynamic_object_size evaluates to an
  expression that represents a precise size of the object being pointed
  to.

- In case a precise object size expression cannot be evaluated,
  __builtin_dynamic_object_size attempts to evaluate an estimate size
  expression based on the object size type.

- In what situations the builtin returns an estimate vs a precise
  expression is an implementation detail and may change in future.
  Users must always assume, as in the case of __builtin_object_size, that
  the returned value is the maximum or minimum based on the object size
  type they have provided.

- In the worst case of failure, __builtin_dynamic_object_size returns a
  constant (size_t)-1 or (size_t)0.

Implementation:
---

- The __builtin_dynamic_object_size support is implemented in
  tree-object-size.  In most cases the first pass (early_objsz) the
  builtin is treated like __builtin_object_size to preserve subobject
  bounds.

- Each element of the object_sizes vector is now a TREE_VEC of size 2
  holding bytes to the end of the object and the full size of the
  object.  This allows proper handling of negative offsets, allowing
  them to the extent of the whole object bounds.  This improves
  __builtin_object_size usage too with negative offsets, consistently
  returning valid results for pointer decrementing loops too.

- The patchset begins with structural modification of the
  tree-object-size pass, followed by enhancement to return size
  expressions.  I have split the implementation into one feature per
  patch (calls, function parameters, PHI, etc.) to hopefully ease
  review.

Performance:


Expressions generated by this pass in theory could be arbitrarily
complex.  I have not made an attempt to limit nesting of objects since
it seemed too early to do that.  In practice based on the few
applications I built, most of the complexity of the expressions got
folded away.  Even so, the performance overhead is likely to be
non-zero.  If we find performance degradation to be significant, we
could later add nesting limits to bail out if a size expression gets too
complex.

I have implemented simplification of __*_chk to their normal
variants if we can determine at compile time that it is safe.  This
should limit the performance overhead of the expressions in valid cases.

Build time performance doesn't seem to be affected much based on an
unscientific check to time
`make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`.  It only increases by
about a couple of seconds when the dynamic tests are added and remains
more or less in the same ballpark otherwise.

Testing:


I have added tests for dynamic object sizes as well as wrappers for all
__builtin_object_size tests to provide wide coverage.  I have also done
a full bootstrap build and test run on x86_64.

I have also built bash, cmake, wpa_supplicant and systemtap with
_FORTIFY_SOURCE=2 and _FORTIFY_SOURCE=3 (with a hacked up glibc to make
sure it works) and saw no issues in any of those builds.  I did some
rudimentary analysis of the generated binaries using fortify-metrics[1]
to confirm that there was a difference in coverage between the two
fortification levels.

Here is a summary of coverage in the above packages:

F = number of fortified calls
T = Total number of calls to fortifiable functions (fortified as well as
unfortified)
C = F * 100/ T

Package F(2)T(2)F(3)T(3)C(2)C(3)
bash428 12201005119635.08%  84.03%
wpa_supplicant  163532322350340850.59%  68.96%
systemtap   324 1990343 199416.28%  17.20%
cmake   830 14181   958 14196   5.85%   6.75%

The numbers are slightly lower than the previous patch series because in
the interim I pushed an improvement to folding of the _chk builtins so
that they can use ranges to simplify the calls to their regular
variants.  Also note that even _FORTIFY_SOURCE=2 coverage should be
improved due to negative offset handling.

Additional testing plans (i.e. I've already started to do some of this):

- Build packages to compare values returned by __builtin_object_size
  with the older pass and this new one.  Also compare with
  __builtin_dynamic_object_size.

- Expand the list of packages to get more coverage metrics.

- Explore performance impact on 

[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271

--- Comment #6 from Jim Wilson  ---
See also bug 103302 which can also be fixed by adding a movti pattern.

[Bug target/103302] wrong code with -fharden-compares

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302

--- Comment #4 from Jim Wilson  ---
See also bug 103271 which can also be fixed by adding a movti pattern.

[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #5 from Jim Wilson  ---
SiFive doesn't support -mno-strict-align so I've never tested it.  I doubt that
it works correctly, i.e. I doubt that it optimizes as intended.  I've mentioned
this to other RVI members, but there hasn't been anyone other than SiFive
actively working on upstream gcc so I don't think anyone ever looked at it.  It
shouldn't give an ICE though.

Looking at this, it appears to be another "if only we had a movti pattern"
issue.

In expand_DEFERRED_INIT in internal-fn.c, in the reg_lhs == TRUE case, there is
a test
  && have_insn_for (SET, var_mode))
which fails because var_mode is TImode and we don't have a movti pattern.  The
code calls build_zero_cst which returns a constructor with an array type.  We
then call expand_assignment which gets confused as it doesn't know the size of
the array it is copying.

However, if we had a movti pattern, then the code computes the size of the
array, and creates a VIEW_CONVERT_EXPR to document the array size before
calling expand_assignment.  So it looks like it would work if we had a movti
pattern.

I verified that adding a dummy movti pattern makes the ICE go away.

[Bug target/103302] wrong code with -fharden-compares

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302

--- Comment #3 from Jim Wilson  ---
Maybe the register allocator should remove clobbers of pseudos, instead of
turning them into clobbers of hard register pairs.  That would eliminate the
ambiguity after register allocation.  It is also true that we don't needs hard
reg clobbers.  The clobbers are only there for tracking pseudo reg subregs.

[r12-5531 Regression] FAIL: gcc.dg/ipa/inline-9.c scan-ipa-dump inline "Inlined 1 calls" on Linux/x86_64

2021-11-25 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

1b0acc4b800b589a39d637d7312da5cf969a5765 is the first bad commit
commit 1b0acc4b800b589a39d637d7312da5cf969a5765
Author: Jan Hubicka 
Date:   Thu Nov 25 23:58:48 2021 +0100

Remove forgotten early return in ipa_value_range_from_jfunc

caused

FAIL: gcc.dg/ipa/inline-9.c scan-ipa-dump inline "Inlined 1 calls"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5531/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[Bug target/103433] ICE in convert_move, at expr.c:219

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103433

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||ice-on-valid-code
   Host|x86_64-linux|
   Last reconfirmed||2021-11-26
 Ever confirmed|0   |1
 CC||pinskia at gcc dot gnu.org
  Component|c   |target
 Target|aarch64-none-elf|aarch64*-*-*

--- Comment #1 from Andrew Pinski  ---
Confirmed on the trunk.

[Bug c/103433] New: ICE in convert_move, at expr.c:219

2021-11-25 Thread ilyply2006 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103433

Bug ID: 103433
   Summary: ICE in convert_move, at expr.c:219
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ilyply2006 at hotmail dot com
  Target Milestone: ---

$ cat test.c
#include "arm_sve.h"
__attribute__((noinline)) void
test_ldst_1 (svfloat32_t op0, svfloat32x2_t *op1)
{
*op1 = *(svfloat32x2_t*)
}

$ ./aarch64-none-elf-gcc -v -save-temps -march=armv8.2-a+sve test.c -O3 -S
Using built-in specs.
COLLECT_GCC=./aarch64-none-elf-gcc
Target: aarch64-none-elf
Configured with:
/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/configure
SHELL=/bin/sh
--with-mpc=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu
--with-mpfr=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu
--with-gmp=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu
--with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto --enable-shared
--without-included-gettext --enable-nls --with-system-zlib
--disable-sjlj-exceptions --enable-gnu-unique-object --enable-linker-build-id
--disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu
--enable-libstdcxx-debug --enable-long-long --with-cloog=no --with-ppl=no
--with-isl=no --enable-multilib --enable-fix-cortex-a53-835769
--enable-fix-cortex-a53-843419 --with-arch=armv8-a --enable-threads=no
--disable-multiarch --with-newlib --with-build-sysroot=
--with-sysroot=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/aarch64-none-elf/libc
--enable-checking=release --disable-bootstrap --enable-languages=c,c++,lto
--prefix=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=aarch64-none-elf
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.2.1 20201103 (GCC)
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-march=armv8.2-a+sve' '-O3' '-S'
'-mlittle-endian' '-mabi=lp64'

/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/libexec/gcc/aarch64-none-elf/10.2.1/cc1
-E -quiet -v test.c -march=armv8.2-a+sve -mlittle-endian -mabi=lp64 -O3
-fpch-preprocess -o test.i
ignoring nonexistent directory
"/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/aarch64-none-elf/libc/usr/local/include"
ignoring nonexistent directory
"/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/aarch64-none-elf/libc/usr/include"
#include "..." search starts here:
#include <...> search starts here:

/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-none-elf/10.2.1/include

/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-none-elf/10.2.1/include-fixed

/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-none-elf/10.2.1/../../../../aarch64-none-elf/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-march=armv8.2-a+sve' '-O3' '-S'
'-mlittle-endian' '-mabi=lp64'

/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/libexec/gcc/aarch64-none-elf/10.2.1/cc1
-fpreprocessed test.i -quiet -dumpbase test.c -march=armv8.2-a+sve
-mlittle-endian -mabi=lp64 -auxbase test -O3 -version -o test.s
GNU C17 (GCC) version 10.2.1 20201103 (aarch64-none-elf)
compiled by GNU C version 7.5.0, GMP version 4.3.2, MPFR version 3.1.6,
MPC version 1.0.3, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C17 (GCC) version 10.2.1 20201103 (aarch64-none-elf)
compiled by GNU C version 7.5.0, GMP version 4.3.2, MPFR version 3.1.6,
MPC version 1.0.3, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 2cefa28229609aee36b21907b2deb066
during RTL pass: expand
test.c: In function ‘test_ldst_1’:
test.c:5:10: internal compiler error: in convert_move, at expr.c:219
5 | *op1 = *(svfloat32x2_t*)
  | ~^~~
0x8606f3 convert_move(rtx_def*, rtx_def*, int)
   
/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/gcc/expr.c:219
0x86773d store_expr(tree_node*, rtx_def*, int, bool, bool)
   
/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/gcc/expr.c:5832
0x867c55 expand_assignment(tree_node*, tree_node*, bool)
   
/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/gcc/expr.c:5516
0x75aed8 expand_gimple_stmt_1
   
/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/gcc/cfgexpand.c:3753
0x75aed8 expand_gimple_stmt
   

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-11-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #25 from Kewen Lin  ---
Status update:

> 
> The fusion related flags have been considered in the posted patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html. 
> 

It's still being ping-ed for review since it's posted on Sep. 01.

> One RFC/Patch
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html is also
> posted to see if we can avoid to change implicit option behavior for
> Power8/9.

The patch v3
(https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579658.html) was
approved with some additional required adjustments. But the cases were
written/tested on top of the above fusion related patch, so I hold to commit
it.

[Bug target/102347] "fatal error: target specific builtin not available" with MMA and LTO

2021-11-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347

Kewen Lin  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2021-11-26

--- Comment #11 from Kewen Lin  ---
Status update: one proposed fix was posted to gcc-patches@ on Sep 28
(https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580357.html), there
were some discussion following that, we agreed the proposed fix is safe
eventually. There are no further new versions for it, so keep the original one
being ping-ed for review.

[Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0
 Resolution|--- |FIXED

--- Comment #5 from Andrew Pinski  ---
.

[Bug c++/98360] sizeof in template difference between g++/icc and clang++

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98360

--- Comment #3 from Andrew Pinski  ---
GCC, ICC and MSVC all agree that this is valid code and all produce 4.
clang is the only one which rejects it.

Here is an even more reduced testcase:
template 
struct uintset
{
  T values[1];
  struct traits  {  };
  struct hash : traits
  {
int foo () {
  return sizeof (uintset::values);
}
  };
  hash h;
};
uintset s;
int x = s.h.foo ();

If you remove the base class or change it not to dependent type, the code is
accepted. 

The defect reports in this area:
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#613
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#198

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2253.html is the paper
which resolves 613.


I suspect GCC is correct if I go by this paper.

[Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c

2021-11-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #4 from Hongtao.liu  ---
Fixed in GCC12.

[Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c

2021-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #3 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:90cb088ece8d8cc1019d25629d1585e5b0234179

commit r12-5536-g90cb088ece8d8cc1019d25629d1585e5b0234179
Author: konglin1 
Date:   Wed Nov 10 09:37:32 2021 +0800

i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode
with -mf16c [PR 102811]

Add define_insn extendhfsf2 and truncsfhf2 for target_f16c.

gcc/ChangeLog:

PR target/102811
* config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit
data in XMM register
for TARGET_SSE2.
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for
TARGET_F16C.
(extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only.
(*extendhf2): Rename from extendhf2.
(truncsfhf2): Likewise.
(truncdfhf2): Likewise.
(*trunc2): Likewise.

gcc/testsuite/ChangeLog:

PR target/102811
* gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw.
* gcc.target/i386/pr90773-23.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.

[Bug middle-end/103419] FAIL: gcc.target/i386/pr102566-10b.c with -mx32

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103419

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |12.0

--- Comment #6 from Andrew Pinski  ---
.

[Bug middle-end/103419] FAIL: gcc.target/i386/pr102566-10b.c with -mx32

2021-11-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103419

--- Comment #5 from Hongtao.liu  ---
Fixed in GCC12.

[Bug middle-end/103419] FAIL: gcc.target/i386/pr102566-10b.c with -mx32

2021-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103419

--- Comment #4 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:379be00f45f65e0e8de72a50553dd9d2bab6cc08

commit r12-5535-g379be00f45f65e0e8de72a50553dd9d2bab6cc08
Author: liuhongt 
Date:   Thu Nov 25 13:51:57 2021 +0800

Fix typo in r12-5486.

gcc/ChangeLog:

PR middle-end/103419
* match.pd: Fix typo, use the type of second parameter, not
first one.

[Bug testsuite/103335] [12 Regression] new test case gcc.dg/tree-ssa/modref-dse-4.c fails

2021-11-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103335
Bug 103335 depends on bug 103282, which changed state.

Bug 103282 Summary: New test case gcc.dg/tree-ssa/modref-dse-5.c in r12-5292 
fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103282

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

[Bug testsuite/103282] New test case gcc.dg/tree-ssa/modref-dse-5.c in r12-5292 fails

2021-11-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103282

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #11 from Jan Hubicka  ---
Fixed.

[Bug ipa/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug ipa/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||needs-reduction, wrong-code
   Last reconfirmed|2021-11-26 00:00:00 |
   Target Milestone|--- |12.0
 Status|ASSIGNED|NEW
   Assignee|hubicka at gcc dot gnu.org |unassigned at gcc dot 
gnu.org
  Known to work||11.2.0
  Component|tree-optimization   |ipa

--- Comment #2 from Andrew Pinski  ---
Confirmed, I have not reduced it but here is what is happening.
  outD.25694 = {};
...
  MEM[(struct DCTToD.21174 *) clique 3 base 1].data_D.21196 =


...
 
_ZN12_GLOBAL__N_121GenericTransposeBlockILm1ELm4ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0D.25466
(, );
...
 
_ZN12_GLOBAL__N_113IDCT1DWrapperILm4ELm1ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0D.25467
(, );

...
  _3 = outD.25694[2];

FRE thinks
_ZN12_GLOBAL__N_113IDCT1DWrapperILm4ELm1ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0
does not touch out even though D.25700 is passed to it 

[Bug tree-optimization/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref

2021-11-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432

Jan Hubicka  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Target Milestone|12.0|---
   Keywords|wrong-code  |
  Known to work|11.2.0  |
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2021-11-26
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org
 CC||hubicka at gcc dot gnu.org
  Component|ipa |tree-optimization

--- Comment #1 from Jan Hubicka  ---
It fails with 
./xgcc -B ./ -O2 d.ii -fdbg-cnt=ipa_mod_ref_pta:189  -fdump-tree-all-details
-fdump-ipa-all-details
and works
./xgcc -B ./ -O2 d.ii -fdbg-cnt=ipa_mod_ref_pta:188  -fdump-tree-all-details
-fdump-ipa-all-details

The difference in optimized dump is:

 int main ()
 {
   struct DCTFrom D.11418;
@@ -2805,12 +2810,7 @@
   float x[4];
   struct DCTTo D.11356;
   struct DCTFrom D.11355;
-  float _3;
-  float _4;
-  double _6;
   struct FILE * stderr.3_8;
-  double _9;
-  struct FILE * stderr.4_11;
   float _12;
   float _13;
   double _15;
@@ -2996,30 +2996,10 @@
   {anonymous}::IDCT1DWrapper.constprop<4, 1, {anonymous}::DCTFrom,
{anonymous}::DCTTo> (, );
   D.11400 ={v} {CLOBBER};
   D.11356 ={v} {CLOBBER};
-  _3 = out[2];
-  _4 = _3 - 1.0e+0;
-  actual_accuracy_5 = ABS_EXPR <_4>;
-  if (actual_accuracy_5 > 9.99974752427078783512115478515625e-7)
-goto ; [0.04%]
-  else
-goto ; [99.96%]
-
-   [local count: 429325]:
-  _6 = (double) actual_accuracy_5;
   stderr.3_8 = stderr;
-  fprintf (stderr.3_8, "ERROR: Too low accuracy: exp=%f act=%f\n",
9.99974752427078783512115478515625e-7, _6);
+  fprintf (stderr.3_8, "ERROR: Too low accuracy: exp=%f act=%f\n",
9.99974752427078783512115478515625e-7, 1.0e+0);
   exit (1);

-   [local count: 1072883004]:
-  _9 = (double) actual_accuracy_5;
-  stderr.4_11 = stderr;
-  fprintf (stderr.4_11, "OK: Good accuracy: exp=%f act=%f\n",
9.99974752427078783512115478515625e-7, _9);
-  x ={v} {CLOBBER};
-  out ={v} {CLOBBER};
-  coeffs ={v} {CLOBBER};
-  scratch_space ={v} {CLOBBER};
-  return 0;
-
 }

And I suppose we are not expected to optimize out the "Good accuracy" message
:)
So it looks out is modified by  {anonymous}::IDCT1DWrapper.constprop<4, 1,
{anonymous}::DCTFrom, {anonymous}::DCTTo> (, );
but for some reason ipa propagation gets no_indirect_clobber for param1.  This
seems wrong since to->data is written to, so it is an indirect clobber.
I may be able to look more into this only tomorrow - it is bit late.

[Bug target/103302] wrong code with -fharden-compares

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
It is the second reversed comparison that is wrong.  This is the
  u32_0 <= (...)
on the first line of foo0.  In the assembly file, this ends up as
mv  a0,a6
mv  a1,a7
xor a6,a0,a6
xor a7,a1,a7
or  a6,a6,a7
seqza6,a6
and note that it is comparing a value against itself when it should be
comparing two different values.

The harden compare pass is generating RTL

insn 156 152 155 6 (set (reg:TI 201)
(asm_operands:TI ("") ("=g") 0 [
(reg:TI 77 [ _8 ])
]
 [
(asm_input:TI ("0"))
]
 [])) -1
 (nil))
(insn 155 156 153 6 (clobber (reg:TI 77 [ _8 ])) -1
 (nil))
(insn 153 155 154 6 (set (subreg:DI (reg:TI 77 [ _8 ]) 0)
(subreg:DI (reg:TI 201) 0)) -1
 (nil))
(insn 154 153 160 6 (set (subreg:DI (reg:TI 77 [ _8 ]) 8)
(subreg:DI (reg:TI 201) 8)) -1
 (nil))

Then the asmcons pass is changing this to

(insn 851 152 849 5 (clobber (reg:TI 201)) -1
 (nil))
(insn 849 851 850 5 (set (subreg:DI (reg:TI 201) 0)
(subreg:DI (reg:TI 77 [ _8 ]) 0)) -1
 (nil))
(insn 850 849 156 5 (set (subreg:DI (reg:TI 201) 8)
(subreg:DI (reg:TI 77 [ _8 ]) 8)) -1
 (nil))
(insn 156 850 155 5 (set (reg:TI 201)
(asm_operands:TI ("") ("=g") 0 [
(reg:TI 201)
]
 [
(asm_input:TI ("0"))
]
 [])) -1
 (expr_list:REG_DEAD (reg:TI 77 [ _8 ])
(nil)))
(insn 155 156 153 5 (clobber (reg:TI 77 [ _8 ])) -1
 (nil))
(insn 153 155 154 5 (set (subreg:DI (reg:TI 77 [ _8 ]) 0)
(subreg:DI (reg:TI 201) 0)) 135 {*movdi_64bit}
 (nil))
(insn 154 153 854 5 (set (subreg:DI (reg:TI 77 [ _8 ]) 8)
(subreg:DI (reg:TI 201) 8)) 135 {*movdi_64bit}
 (expr_list:REG_DEAD (reg:TI 201)
(nil)))

Then the register allocator puts both 77 and 201 in the same register, which
means we are now clobbering values we need.

In the reload dump I see

(insn 851 152 849 5 (clobber (reg:TI 16 a6 [201])) -1
 (nil))
(insn 849 851 850 5 (set (reg:DI 16 a6 [201])
(reg:DI 10 a0 [orig:77 _8 ] [77])) 135 {*movdi_64bit}
 (nil))
(insn 850 849 907 5 (set (reg:DI 17 a7 [+8 ])
(reg:DI 11 a1 [ _8+8 ])) 135 {*movdi_64bit}
 (nil))
(insn 907 850 1014 5 (clobber (reg:TI 16 a6 [201])) -1
 (nil))

so the insns 849 and 850 get optimized away, but we need them.  Also, we have

(insn 854 154 852 5 (clobber (reg:TI 16 a6 [202])) -1
 (nil))
(insn 852 854 853 5 (set (reg:DI 16 a6 [202])
(reg:DI 6 t1 [orig:86 _39 ] [86])) 135 {*movdi_64bit}
 (nil))
(insn 853 852 913 5 (set (reg:DI 17 a7 [+8 ])
(reg:DI 7 t2 [ _39+8 ])) 135 {*movdi_64bit}
 (nil))
(insn 913 853 1010 5 (clobber (reg:TI 16 a6 [202])) -1
 (nil))

and the insns 852 and 853 get optimized away, but we need them.  The comparison
is supposed to be a0/a1 versus t1/t2, but we end up with comparing a6/a7
against itself.

asmcons is calling emit_move_insn to copy the asm source to the asm dest so it
can simplify the asm.  Since this is a multiword mode, and the riscv backend
doesn't have a movti pattern, this ends up calling emit_move_multi_word which
emits the extra clobber that causes the problem.

I suppose we could fix this by adding a movti pattern to the riscv backend to
avoid the clobbers but we shouldn't have to.  Though it would be interesting to
see if this maybe results in better code optimization.

It isn't clear exactly where the problem is.  Maybe asmcons shouldn't try to
fix an asm when the mode is larger than the word mode?  This could be left to
the register allocator to fix.  Or maybe harden compare shouldn't generate RTL
like this?  This could be a harden compare issue, or maybe an issue with the
RTL expander to emit the rtl differently.  Looks like the same issue with the
RTL expander calling emit_move_multi_word which generates the clobber.  Or
maybe a movti pattern is actually required now?

I did verify that disabling asmcons fixes the problem for this testcase.  I had
to hack the code in function.c to do that as there is no option to disable it.

[Bug ipa/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |12.0
 CC||marxin at gcc dot gnu.org
  Component|tree-optimization   |ipa
  Known to work||11.2.0

[Bug tree-optimization/103432] New: [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref

2021-11-25 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432

Bug ID: 103432
   Summary: [12 regression] libjxl-0.5 is miscompiled, works fine
with -fno-ipa-modref
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: slyfox at gcc dot gnu.org
  Target Milestone: ---

Created attachment 51875
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51875=edit
dct_test.cc

Originally noticed a problem as failed tests on libjxl-0.5.

I extracted ~10KB self-contained single-file example. It still could be
reduced, but it's quite tangled. Could you see what is obviously wrong with it?

Attached the reproducer as dct_test.cc:

$ g++-12.0.0 -std=c++11 -O2 -fno-tree-vectorize dct_test.cc -o
dct_test
$ g++-12.0.0 -std=c++11 -O2 -fno-tree-vectorize -fno-ipa-modref dct_test.cc -o
dct_test1

# good:
$ ./dct_test1
OK: Good accuracy: exp=0.01 act=0.00
OK: Good accuracy: exp=0.01 act=0.00

# bad:
$ ./dct_test
OK: Good accuracy: exp=0.01 act=0.00
ERROR: Too low accuracy: exp=0.01 act=1.00

$ g++-12.0.0 -v
Using built-in specs.
COLLECT_GCC=/nix/store/2lxwqh3k88x4jwyfwlsfnwrp78yq2ah2-gcc-12.0.0/bin/g++
COLLECT_LTO_WRAPPER=/nix/store/2lxwqh3k88x4jwyfwlsfnwrp78yq2ah2-gcc-12.0.0/libexec/gcc/x86_64-unknown-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with:
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20211121 (experimental) (GCC)

[Bug c++/92385] extremely long and memory intensive compilation for brace construction of array member

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92385

Andrew Pinski  changed:

   What|Removed |Added

 CC||beyondstandard at gmail dot com

--- Comment #8 from Andrew Pinski  ---
*** Bug 71165 has been marked as a duplicate of this bug. ***

[Bug c++/71165] std::array with aggregate initialization generates huge code

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71165

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #4 from Andrew Pinski  ---
Dup of bug 92385.

*** This bug has been marked as a duplicate of bug 92385 ***

[Bug c++/92385] extremely long and memory intensive compilation for brace construction of array member

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92385

Andrew Pinski  changed:

   What|Removed |Added

 CC||hehaochen at hotmail dot com

--- Comment #7 from Andrew Pinski  ---
*** Bug 94957 has been marked as a duplicate of this bug. ***

[Bug c++/94957] Compilation slowww for simple code with big array of structs with constructors

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94957

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #6 from Andrew Pinski  ---
Dup of bug 92385.

*** This bug has been marked as a duplicate of bug 92385 ***

[Bug c++/94957] Compilation slowww for simple code with -O1/2/3 and -g in GCC 8 and 9

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94957

Andrew Pinski  changed:

   What|Removed |Added

 CC||ilord.tiran at yandex dot ru

--- Comment #5 from Andrew Pinski  ---
*** Bug 98547 has been marked as a duplicate of this bug. ***

[Bug c++/98547] GCC spends many minutes instead of seconds building a file with array initialization

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98547

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
Yes this is a dup of bug 94957.

*** This bug has been marked as a duplicate of bug 94957 ***

[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument

2021-11-25 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418

--- Comment #8 from Steve Kargl  ---
On Thu, Nov 25, 2021 at 02:18:46PM -0800, Steve Kargl wrote:
> On Thu, Nov 25, 2021 at 10:10:32PM +, anlauf at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418
> > 
> > --- Comment #6 from anlauf at gcc dot gnu.org ---
> > Unfortunately the patch in comment#5 does not work for me. :-(
> > 
> > Interestingly, the Intel compiler fails on the testcase, too.
> > 
> 
> Hmmm.  I did have a number of other patches in my tree.  I
> wonder if one of those helped.  Unfortunately, I updated
> my git repository, where I cleared out all patch, and it
> takes a long time to rebuild gcc on my laptop.
> 

For the record, 

module test
   implicit none
   contains
   subroutine change_pointer_target(ptr)
  real, pointer, intent(in) :: ptr(:)
  call random_number(ptr)
  ptr(:) = ptr + 1.0
   end subroutine change_pointer_target
end module test

program foo
   use test
   implicit none
   real, pointer :: a(:), b
   allocate(a(4), b)
   call random_number(b)
   call random_number(a)
   print '(5F8.5)', b, a
end program foo

% gfcx -o z a.f90 && ./z
 0.65287 0.82614 0.77541 0.61923 0.52961

[Bug c/98487] ICE: tree check: expected identifier_node, have tree_list in is_attribute_p, at attribs.h:155 [C2X attribute syntax, gnu::format and -Wsuggest-attribute=format]

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98487

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||ice-checking
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-11-25

--- Comment #2 from Andrew Pinski  ---
Confirmed.
Simplier testcase:
#include 


[[gnu::__format__(__printf__, 1, 2)]]
void
do_printf(const char * const a0, ...)
  {
  va_list ap;
  va_start(ap, a0);
  __builtin_vprintf(a0, ap);
  va_end(ap);
  }

[[gnu::__format__(__scanf__, 1, 2)]]
void
do_scanf(const char * const a0, ...)
  {
  va_list ap;
  va_start(ap, a0);
  __builtin_vscanf(a0, ap);
  va_end(ap);
  }

[[gnu::__format__(__strftime__, 1, 0)]]
void
do_strftime(const char * const a0, struct tm * a1)
  {
  char buff[256];
  __builtin_strftime(buff, sizeof(buff), a0, a1);
  puts(buff);
  }

[committed] libstdc++: Remove dg-error that no longer happens

2021-11-25 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.



There was a c++11_only dg-error in this testcase, for a "body of
constexpr function is not a return statement" diagnostic that was bogus,
but happened because the return statement was ill-formed. A change to
G++ earlier this month means that diagnostic is no longer emitted, so
remove the dg-error.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
Remove dg-error for C++11_only error.
---
 .../testsuite/20_util/tuple/comparison_operators/overloaded2.cc  | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc 
b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
index bac16ffd521..6a7a584c71e 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
+++ b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
@@ -52,4 +52,3 @@ auto b = a < a;
 // { dg-error "no match for 'operator<'" "" { target c++20 } 0 }
 // { dg-error "no match for .*_Synth3way|in requirements" "" { target c++20 } 
0 }
 // { dg-error "ordered comparison" "" { target c++17_down } 0 }
-// { dg-error "not a return-statement" "" { target c++11_only } 0 }
-- 
2.31.1



[committed] libstdc++: Make std::pointer_traits SFINAE-friendly [PR96416]

2021-11-25 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.



This implements the resolution I'm proposing for LWG 3545, to avoid hard
errors when using std::to_address for types that make pointer_traits
ill-formed.

Consistent with std::iterator_traits, instantiating std::pointer_traits
for a non-pointer type will be well-formed, but give an empty type with
no member types. This avoids the problematic cases for std::to_address.
Additionally, the pointer_to member is now only declared when the
element type is not cv void (and for C++20, when the function body would
be well-formed). The rebind member was already SFINAE-friendly in our
implementation.

libstdc++-v3/ChangeLog:

PR libstdc++/96416
* include/bits/ptr_traits.h (pointer_traits): Reimplement to be
SFINAE-friendly (LWG 3545).
* testsuite/20_util/pointer_traits/lwg3545.cc: New test.
* testsuite/20_util/to_address/1_neg.cc: Adjust dg-error line.
* testsuite/20_util/to_address/lwg3545.cc: New test.
---
 libstdc++-v3/include/bits/ptr_traits.h| 167 +-
 .../20_util/pointer_traits/lwg3545.cc | 120 +
 .../testsuite/20_util/to_address/1_neg.cc |   2 +-
 .../testsuite/20_util/to_address/lwg3545.cc   |  12 ++
 4 files changed, 251 insertions(+), 50 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/pointer_traits/lwg3545.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/to_address/lwg3545.cc

diff --git a/libstdc++-v3/include/bits/ptr_traits.h 
b/libstdc++-v3/include/bits/ptr_traits.h
index 115b86d43e4..4987fa9942f 100644
--- a/libstdc++-v3/include/bits/ptr_traits.h
+++ b/libstdc++-v3/include/bits/ptr_traits.h
@@ -35,6 +35,7 @@
 #include 
 
 #if __cplusplus > 201703L
+#include 
 #define __cpp_lib_constexpr_memory 201811L
 namespace __gnu_debug { struct _Safe_iterator_base; }
 #endif
@@ -45,55 +46,119 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   class __undefined;
 
-  // Given Template return T, otherwise invalid.
+  // For a specialization `SomeTemplate` the member `type` is T,
+  // otherwise `type` is `__undefined`.
   template
 struct __get_first_arg
 { using type = __undefined; };
 
-  template class _Template, typename _Tp,
+  template class _SomeTemplate, typename _Tp,
typename... _Types>
-struct __get_first_arg<_Template<_Tp, _Types...>>
+struct __get_first_arg<_SomeTemplate<_Tp, _Types...>>
 { using type = _Tp; };
 
-  template
-using __get_first_arg_t = typename __get_first_arg<_Tp>::type;
-
-  // Given Template and U return Template, otherwise invalid.
+  // For a specialization `SomeTemplate` and a type `U` the member
+  // `type` is `SomeTemplate`, otherwise there is no member `type`.
   template
 struct __replace_first_arg
 { };
 
-  template class _Template, typename _Up,
+  template class _SomeTemplate, typename _Up,
typename _Tp, typename... _Types>
-struct __replace_first_arg<_Template<_Tp, _Types...>, _Up>
-{ using type = _Template<_Up, _Types...>; };
+struct __replace_first_arg<_SomeTemplate<_Tp, _Types...>, _Up>
+{ using type = _SomeTemplate<_Up, _Types...>; };
 
-  template
-using __replace_first_arg_t = typename __replace_first_arg<_Tp, _Up>::type;
-
-  template
-using __make_not_void
-  = __conditional_t::value, __undefined, _Tp>;
-
-  /**
-   * @brief  Uniform interface to all pointer-like types
-   * @ingroup pointer_abstractions
-  */
+#if __cpp_concepts
+  // When concepts are supported detection of _Ptr::element_type is done
+  // by a requires-clause, so __ptr_traits_elem_t only needs to do this:
   template
-struct pointer_traits
+using __ptr_traits_elem_t = typename __get_first_arg<_Ptr>::type;
+#else
+  // Detect the element type of a pointer-like type.
+  template
+struct __ptr_traits_elem : __get_first_arg<_Ptr>
+{ };
+
+  // Use _Ptr::element_type if is a valid type.
+  template
+struct __ptr_traits_elem<_Ptr, __void_t>
+{ using type = typename _Ptr::element_type; };
+
+  template
+using __ptr_traits_elem_t = typename __ptr_traits_elem<_Ptr>::type;
+#endif
+
+  // Define pointer_traits::pointer_to.
+  template::value>
+struct __ptr_traits_ptr_to
+{
+  using pointer = _Ptr;
+  using element_type = _Elt;
+
+  /**
+   *  @brief  Obtain a pointer to an object
+   *  @param  __r  A reference to an object of type `element_type`
+   *  @return `pointer::pointer_to(__e)`
+   *  @pre `pointer::pointer_to(__e)` is a valid expression.
+  */
+  static pointer
+  pointer_to(element_type& __e)
+#if __cpp_lib_concepts
+  requires requires {
+   { pointer::pointer_to(__e) } -> convertible_to;
+  }
+#endif
+  { return pointer::pointer_to(__e); }
+};
+
+  // Do not define pointer_traits::pointer_to if element type is void.
+  template
+struct __ptr_traits_ptr_to<_Ptr, _Elt, true>
+{ };
+
+  // Partial specialization defining pointer_traits::pointer_to(T&).
+  template
+ 

[Bug libstdc++/96416] [DR 3545] to_address() is broken by static_assert in pointer_traits

2021-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96416

--- Comment #21 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:b8018e5c5ec0e9b6948182f13fba47c67b758d8a

commit r12-5532-gb8018e5c5ec0e9b6948182f13fba47c67b758d8a
Author: Jonathan Wakely 
Date:   Thu Nov 25 16:49:45 2021 +

libstdc++: Make std::pointer_traits SFINAE-friendly [PR96416]

This implements the resolution I'm proposing for LWG 3545, to avoid hard
errors when using std::to_address for types that make pointer_traits
ill-formed.

Consistent with std::iterator_traits, instantiating std::pointer_traits
for a non-pointer type will be well-formed, but give an empty type with
no member types. This avoids the problematic cases for std::to_address.
Additionally, the pointer_to member is now only declared when the
element type is not cv void (and for C++20, when the function body would
be well-formed). The rebind member was already SFINAE-friendly in our
implementation.

libstdc++-v3/ChangeLog:

PR libstdc++/96416
* include/bits/ptr_traits.h (pointer_traits): Reimplement to be
SFINAE-friendly (LWG 3545).
* testsuite/20_util/pointer_traits/lwg3545.cc: New test.
* testsuite/20_util/to_address/1_neg.cc: Adjust dg-error line.
* testsuite/20_util/to_address/lwg3545.cc: New test.

[Bug libstdc++/101608] ranges::fill/fill_n missing std::is_constant_evaluated() condition for __builtin_memset

2021-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101608

--- Comment #3 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:7ae6e4e3831429d20eea1be285dbc6a4a005930f

commit r11-9314-g7ae6e4e3831429d20eea1be285dbc6a4a005930f
Author: Jonathan Wakely 
Date:   Wed Nov 24 13:17:54 2021 +

libstdc++: Do not use memset in constexpr calls to ranges::fill_n
[PR101608]

libstdc++-v3/ChangeLog:

PR libstdc++/101608
* include/bits/ranges_algobase.h (__fill_n_fn): Check for
constant evaluation before using memset.
* testsuite/25_algorithms/fill_n/constrained.cc: Check
byte-sized values as well.

(cherry picked from commit 82c3657dd74896b39937bb0a2aaeba9b8ca105fd)

[Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #14 from H.J. Lu  ---
(In reply to Richard Earnshaw from comment #13)
> Also, note that the comment in gimple-fold.c prior to this change read:
> 
>   /* If we can perform the copy efficiently with first doing all loads
>  and then all stores inline it that way.  Currently efficiently
>  means that we can load all the memory into a single integer
>  register which is what MOVE_MAX gives us.  */
> 
> Which would imply that the AArch64 definition of MOVE_MAX is the correct one.

The GCC manual has

- Macro: MOVE_MAX
 The maximum number of bytes that a single instruction can move
 quickly between memory and registers or between two memory
 locations.

[PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-11-25 Thread H.J. Lu via Gcc-patches
Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
and store, independent of -mprefer-vector-width=bits:

1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES
which are enabled for Intel Sapphire Rapids processor.
2. Add -mmove-max=bits to set the maximum number of bits can be moved from
memory to memory efficiently.  The default value is derived from
X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the
preferred vector width.
3. Add -mstore-max=bits to set the maximum number of bits can be stored to
memory efficiently.  The default value is derived from
X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the
preferred vector width.

gcc/

PR target/103269
* config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE
and PVW_NONE to ix86_target_string.
* config/i386/i386-options.c (ix86_target_string): Add arguments
for move_max and store_max.
(ix86_target_string::add_vector_width): New lambda.
(ix86_debug_options): Pass ix86_move_max and ix86_store_max to
ix86_target_string.
(ix86_function_specific_print): Pass ptr->x_ix86_move_max and
ptr->x_ix86_store_max to ix86_target_string.
(ix86_valid_target_attribute_tree): Handle x_ix86_move_max and
x_ix86_store_max.
(ix86_option_override_internal): Set the default x_ix86_move_max
and x_ix86_store_max.
* config/i386/i386-options.h (ix86_target_string): Add
prefer_vector_width and prefer_vector_width.
* config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed.
(TARGET_AVX256_STORE_BY_PIECES): Likewise.
(MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max ==
PVW_AVX512.  Use 32 if ix86_move_max or ix86_store_max >=
PVW_AVX256.
(STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512.
Use 32 if ix86_store_max >= PVW_AVX256.
* config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits.
* config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New.
(X86_TUNE_AVX512_STORE_BY_PIECES): Likewise.
* doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits.

gcc/testsuite/

PR target/103269
* gcc.target/i386/pieces-memcpy-17.c: New test.
* gcc.target/i386/pieces-memcpy-18.c: Likewise.
* gcc.target/i386/pieces-memcpy-19.c: Likewise.
* gcc.target/i386/pieces-memcpy-20.c: Likewise.
* gcc.target/i386/pieces-memcpy-21.c: Likewise.
* gcc.target/i386/pieces-memset-45.c: Likewise.
* gcc.target/i386/pieces-memset-46.c: Likewise.
* gcc.target/i386/pieces-memset-47.c: Likewise.
* gcc.target/i386/pieces-memset-48.c: Likewise.
* gcc.target/i386/pieces-memset-49.c: Likewise.
---
 gcc/config/i386/i386-expand.c |  1 +
 gcc/config/i386/i386-options.c| 75 +--
 gcc/config/i386/i386-options.h|  6 +-
 gcc/config/i386/i386.h| 18 ++---
 gcc/config/i386/i386.opt  |  8 ++
 gcc/config/i386/x86-tune.def  | 10 +++
 gcc/doc/invoke.texi   | 13 
 .../gcc.target/i386/pieces-memcpy-17.c| 16 
 .../gcc.target/i386/pieces-memcpy-18.c| 16 
 .../gcc.target/i386/pieces-memcpy-19.c| 16 
 .../gcc.target/i386/pieces-memcpy-20.c| 16 
 .../gcc.target/i386/pieces-memcpy-21.c| 16 
 .../gcc.target/i386/pieces-memset-45.c| 16 
 .../gcc.target/i386/pieces-memset-46.c| 17 +
 .../gcc.target/i386/pieces-memset-47.c| 17 +
 .../gcc.target/i386/pieces-memset-48.c| 17 +
 .../gcc.target/i386/pieces-memset-49.c| 16 
 17 files changed, 276 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-45.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-46.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-47.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-48.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-49.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 0d5d1a0e205..7e77ff56ddc 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -12295,6 +12295,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
   char *opts = ix86_target_string (bisa, bisa2, 0, 0, NULL, NULL,
   (enum fpmath_unit) 0,
  

[Bug tree-optimization/98304] Failure to optimize bitwise arithmetic pattern

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98304

--- Comment #2 from Andrew Pinski  ---
> @1 == (@2)-1

Should have been:
@1 == -(@2-1)

maybe check that @1 is a mask.

gcc-9-20211125 is now available

2021-11-25 Thread GCC Administrator via Gcc
Snapshot gcc-9-20211125 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/9-20211125/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 9 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-9 
revision 3d1f5e86fb4351a109d45fe441b1b00d6e56c277

You'll find:

 gcc-9-20211125.tar.xzComplete GCC

  SHA256=8e9f79a98e8fffa14dc98ea6731b2e18fb7016a36f4d21d28d5e81575ffdcee2
  SHA1=6ff9b4788c37ab0ae4640116f7103b5304564f72

Diffs from 9-2028 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug tree-optimization/98304] Failure to optimize bitwise arithmetic pattern

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98304

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2021-11-25
   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
  _1 = MAX_EXPR ;
  _2 = _1 & -64;
  _4 = n_3(D) - _2;

Something like:

(simplify
 (minus @0 (bit_and (max @0 INTEGER_CST@1) INTEGER_CST@2))
 (if (@1 == (@2)-1)
  (if (TYPE_SIGN (type) == UNSIGNED)
   (bit_and @0 @1)
   (cond (le @0 @1) @0 (bit_and @0 @1))
  )
 )
)

Note LLVM handles the unsigned case already.

Also note also even though GCC can handle the loop case for signed, it only
handles it on the RTL level, for gimple GCC produces:
  _3 = n_2(D) + -64;
  _8 = (unsigned int) n_2(D);
  _9 = _8 + 4294967232; // _9 = _3 - 64
  _10 = _9 >> 6; // _10 = _9/64
  _11 = (int) _10;
  _12 = _11 * -64;
  n_1 = _3 + _12;

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-3903-g0288527f47cec669

2021-11-25 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

--- Comment #6 from hubicka at kam dot mff.cuni.cz ---
> Started with r12-3903-g0288527f47cec669.
This is September change (for which we have PR102943) however the
regression range was g:1ae8edf5f73ca5c3 (or g:264f061997c0a534 on second
plot) and g:3e09331f6aeaf595 which is the latest regression visible on
the graphs appearing betwen Nov 12 and Nov 15.

The September regression is there too, but it is tracket as PR102943

[Bug rtl-optimization/79048] Unnecessary reload for flags setting insn when operands die

2021-11-25 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79048

Roger Sayle  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
 CC||roger at nextmovesoftware dot 
com

--- Comment #2 from Roger Sayle  ---
This issue appears to be fixed on mainline.
The test case now generates:

f1: orb %dil, %sil
jne .L4
ret

[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument

2021-11-25 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418

--- Comment #7 from Steve Kargl  ---
On Thu, Nov 25, 2021 at 10:10:32PM +, anlauf at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418
> 
> --- Comment #6 from anlauf at gcc dot gnu.org ---
> Unfortunately the patch in comment#5 does not work for me. :-(
> 
> Interestingly, the Intel compiler fails on the testcase, too.
> 

Hmmm.  I did have a number of other patches in my tree.  I
wonder if one of those helped.  Unfortunately, I updated
my git repository, where I cleared out all patch, and it
takes a long time to rebuild gcc on my laptop.

[Bug tree-optimization/103423] [12 Regression] 19% cpu2006 wrf compile time regression with -flto since r12-3903-g0288527f47cec669

2021-11-25 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103423

--- Comment #1 from hubicka at kam dot mff.cuni.cz ---
Martin,
My original report here was on regression at July 17 2021 (range
g:0b7a11874d4eb428 and g:704e8a825c78b9a8)
which seems unrelated to g:r12-3903-g0288527f47cec669 
which is in Sep 21 2021

I think we are mixing up the cpu2006 and cpu2017 wrf's that seems to
regress on different times.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103423
> 
> Martin Liška  changed:
> 
>What|Removed |Added
> 
>See Also||https://gcc.gnu.org/bugzill
>||a/show_bug.cgi?id=103409
> 
> -- 
> You are receiving this mail because:
> You reported the bug.

[Bug c++/98030] error message for enum definition without ';' could be improved to include a fixit note

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98030

Andrew Pinski  changed:

   What|Removed |Added

Summary|error message for enum  |error message for enum
   |definition without ';'  |definition without ';'
   |could be improved   |could be improved to
   ||include a fixit note
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2021-11-25
   Severity|normal  |enhancement

--- Comment #3 from Andrew Pinski  ---
Confirmed.

[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument

2021-11-25 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418

--- Comment #6 from anlauf at gcc dot gnu.org ---
Unfortunately the patch in comment#5 does not work for me. :-(

Interestingly, the Intel compiler fails on the testcase, too.

[PATCH, v2] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377

2021-11-25 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 25.11.21 um 22:02 schrieb Mikael Morin:

Le 25/11/2021 à 21:03, Harald Anlauf a écrit :

Hi Mikael,

Am 25.11.21 um 17:46 schrieb Mikael Morin:

Hello,

Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit :

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 5a5aca10ebe..837eb0912c0 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr
*shape,
 {
   gfc_constructor *c;
   bool test;
+  gfc_constructor_base b;

+  if (shape->expr_type == EXPR_ARRAY)
+    b = shape->value.constructor;
+  else if (shape->expr_type == EXPR_VARIABLE)
+    b = shape->symtree->n.sym->value->value.constructor;


This misses a check that shape->symtree->n.sym->value is an array, so
that it makes sense to access its constructor.


there are checks further above for the cases
   shape->expr_type == EXPR_ARRAY
and for
   shape->expr_type == EXPR_VARIABLE
which look at the elements of array shape to see if they are
non-negative.

Only in those cases where the full "if ()'s" pass we set
shape_is_const = true; and proceed.  The purpose of the auxiliary
bool shape_is_const is to avoid repeating the lengthy if's again.
Only then the above cited code segment should get executed.

For shape->expr_type == EXPR_ARRAY there is really no change in logic.
For shape->expr_type == EXPR_VARIABLE the above snipped is now executed,
but then we already had

   else if (shape->expr_type == EXPR_VARIABLE && shape->ref
    && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen
== 1
    && shape->ref->u.ar.as
    && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT
    && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER
    && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT
    && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER
    && shape->symtree->n.sym->attr.flavor == FL_PARAMETER
    && shape->symtree->n.sym->value)

In which situations do I miss anything new?


Yes, I agree with all of this.
My comment wasn’t about a check on shape->expr_type, but on
shape->value->expr_type if shape->expr_type is a (parameter) variable.


Actually, this only supports the case where the parameter value is
defined by an array; but it could be an intrinsic call, a sum of
parameters, a reference to an other parameter, etc.


E.g. the following (still) does get rejected:

   print *, reshape([1,2,3,4,5], a+1)
   print *, reshape([1,2,3,4,5], a+a)
   print *, reshape([1,2,3,4,5], 2*a)
   print *, reshape([1,2,3,4,5], [3,3])
   print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2))

and has been rejected before.




The usual way to handle this is to call gfc_reduce_init_expr which (pray
for it) will make an array out of whatever the shape expression is.


Can you give an example where it fails?

I think the current code would almost certainly fail, too.


Probably, I was just trying to avoid followup bugs. ;-)

I have checked the following:

   integer, parameter :: a(2) = [1,1]
   integer, parameter :: b(2) = a + 1
   print *, reshape([1,2,3,4], b)
end

and it doesn’t fail as I thought it would.


well, that one is actually better valid, since b=[2,2].


So yes, I was wrong; b has been expanded to an array before.


Motivated by your reasoning I tried gfc_reduce_init_expr.  That attempt
failed miserably (many regressions), and I think it is not right.

Then I found that array sections posed a problem that wasn't detected
before.  gfc_simplify_expr seemed to be a better choice that makes more
sense for the present situations and seems to work here.  And it even
detects many more invalid cases now than e.g. Intel ;-)

I've updated the patch and testcase accordingly.


Can you add an assert or a comment saying that the parameter value has
been expanded to a constant array?

Ok with that change.



Given the above discussion, I'll give you another day or two to have a
further look.  Otherwise Gerhard will... ;-)

Cheers,
Harald
From 56fd0d23ac0a5bda802e5cce3024b947e497555a Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 25 Nov 2021 22:39:44 +0100
Subject: [PATCH] Fortran: improve check of arguments to the RESHAPE intrinsic

gcc/fortran/ChangeLog:

	PR fortran/103411
	* check.c (gfc_check_reshape): Improve check of size of source
	array for the RESHAPE intrinsic against the given shape when pad
	is not given, and shape is a parameter.  Try other simplifications
	of shape.

gcc/testsuite/ChangeLog:

	PR fortran/103411
	* gfortran.dg/pr68153.f90: Adjust test to improved check.
	* gfortran.dg/reshape_7.f90: Likewise.
	* gfortran.dg/reshape_9.f90: New test.
---
 gcc/fortran/check.c | 22 +-
 gcc/testsuite/gfortran.dg/pr68153.f90   |  2 +-
 gcc/testsuite/gfortran.dg/reshape_7.f90 |  2 +-
 gcc/testsuite/gfortran.dg/reshape_9.f90 | 24 
 4 files changed, 43 insertions(+), 7 deletions(-)
 create mode 100644 

Re: [EXTERNAL] Re: Question about match.pd

2021-11-25 Thread Navid Rahimi via Gcc
> (A << B) eq/ne 0
Yes that is correct. But for detecting such pattern you You have to detect B 
and make sure B is boolean.  GIMPLE transfers that Boolean to integer before 
shifting.

After many hours of debugging, I think I managed to find out what is going on.

+/* cmp : ==, != */
+/* ((B0 << x) cmp 0) -> B0 cmp 0 */
+(for cmp (eq ne)
+ (simplify
+  (cmp (lshift (convert@3 boolean_valued_p@0) @1) integer_zerop@2)
+   (if (TREE_CODE (TREE_TYPE (@3)) == INTEGER_TYPE
+   && (GIMPLE || !TREE_SIDE_EFFECTS (@1)))
+(cmp @0 @2

So when I am transforming something like above pattern to (cmp @0 @2) there is 
a type mismatch between @0 and @2.
@0 is boolean and @2 is integer. That type mismatch does cause a lot of 
headache when going through resimplification.



Best wishes,
Navid.


From: Jeff Law 
Sent: Wednesday, November 24, 2021 15:11
To: Navid Rahimi; gcc@gcc.gnu.org
Subject: [EXTERNAL] Re: Question about match.pd



On 11/24/2021 2:19 PM, Navid Rahimi via Gcc wrote:
> Hi GCC community,
>
> I have a question about pattern matching in match.pd.
>
> So I have a pattern like this [1]:
> #define CMP !=
> bool f(bool c, int i) { return (c << i) CMP 0; }
> bool g(bool c, int i) { return c CMP 0;}
>
> It is verifiably correct to transfer f to g [2]. Although this pattern looks 
> simple, but the problem rises because GIMPLE converts booleans to int before 
> "<<" operation.
> So at the end you have boolean->integer->boolean conversion and the shift 
> will happen on the integer in the middle.
>
> For example, for something like:
>
> bool g(bool c){return (c << 22);}
>
> The GIMPLE is:
> _Bool g (_Bool c)
> {
>int _1;
>int _2;
>_Bool _4;
>
> [local count: 1073741824]:
>_1 = (int) c_3(D);
>_2 = _1 << 22;
>_4 = _2 != 0;
>return _4;
> }
>
> I wrote a patch to fix this problem in match.pd:
>
> +(match boolean_valued_p
> + @0
> + (if (TREE_CODE (type) == BOOLEAN_TYPE
> +  && TYPE_PRECISION (type) == 1)))
> +(for op (tcc_comparison truth_and truth_andif truth_or truth_orif truth_xor)
> + (match boolean_valued_p
> +  (op @0 @1)))
> +(match boolean_valued_p
> +  (truth_not @0))
>
> +/* cmp : ==, != */
> +/* ((B0 << x) cmp 0) -> B0 cmp 0 */
> +(for cmp (eq ne)
> + (simplify
> +  (cmp (lshift (convert@3 boolean_valued_p@0) @1) integer_zerop@2)
> +   (if (TREE_CODE (TREE_TYPE (@3)) == INTEGER_TYPE
> +   && (GIMPLE || !TREE_SIDE_EFFECTS (@1)))
> +(cmp @0 @2
>
>
> But the problem is I am not able to restrict to the cases I am interested in. 
> There are many hits in other libraries I have tried compiling with 
> trunk+patch.
>
> Any feedback?
>
> 1) 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D98956data=04%7C01%7Cnavidrahimi%40microsoft.com%7Caa8c9c8213a245c7ae9d08d9af9fc8ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733923073627850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=25KlLcsftTmN83rVawoKKaTPJdCdFlmtXMj%2BwsrKWbo%3Dreserved=0
> 2) 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2FUUTJ_vdata=04%7C01%7Cnavidrahimi%40microsoft.com%7Caa8c9c8213a245c7ae9d08d9af9fc8ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733923073637846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=fwN9%2BB0VObPyuUS2fOtj14i%2BHJIiRhyyjZM4LOF4AP8%3Dreserved=0
It would help to also see the cases you're triggering that you do not
want to trigger.

Could we think of the optimization opportunity in a different way?


(A << B) eq/ne 0  -> A eq/ne (0U >> B)

And I would expect the 0U >> B to get simplified to 0.

Would looking at things that way help?

jeff


[Bug middle-end/103431] [12 Regression] wrong code with -O -fno-tree-bit-ccp -fno-tree-dominator-opts

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103431

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
  Component|rtl-optimization|middle-end
   Last reconfirmed||2021-11-25

--- Comment #1 from Andrew Pinski  ---
Confirmed.
reduced testcase (removing the globals):
typedef unsigned __int128 B;

__attribute__((noipa))
void f(unsigned short a)
{
  B b = 5;
  int size = (sizeof(b)*8)-1;
  a /= 0xfffd;
  B b1 = (b << (a & size) | b >> (-(a & size) & size));
  if (b1 != 5)
__builtin_abort ();
}
int
main (void)
{
f(0);
}

- CUT ---
The gimple level does not change. In GCC 11 and the trunk, we have:
  _1 = (unsigned intD.9) a_8(D);
  _2 = _1 / 4294967293;
  a_9 = (short unsigned intD.18) _2;
  _13 = a_9 & 127;
  _3 = (intD.6) _13;
  b1_10 = 5 r<< _3;
  if (b1_10 != 5)

It looks like the expansion from gimple to RTL of the rotate is different
between the two versions.

Re: libstdc++: Make atomic::wait() const [PR102994]

2021-11-25 Thread Jonathan Wakely via Gcc-patches
On Wed, 24 Nov 2021 at 01:27, Thomas Rodgers wrote:
>
> const qualification was also missing in the free functions for 
> wait/wait_explicit/notify_one/notify_all. Revised patch attached.

Please tweak the whitespace in the new test:

> +test1(const std::atomic , char*p)

The '&' should be on the type not the variable, and there should be a
space before 'p':

> +test1(const std::atomic& a, char* p)

OK for trunk and gcc-11 with that tweak, thanks!



Re: [PATCH v7] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2021-11-25 Thread Segher Boessenkool
Hi!

On Wed, Nov 24, 2021 at 08:48:47PM -0300, Raoni Fassina Firmino wrote:
> gcc/ChangeLog:
> * builtins.c (expand_builtin_fegetround): New function.
> (expand_builtin_feclear_feraise_except): New function.
> (expand_builtin): Add cases for BUILT_IN_FEGETROUND,
> BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT

Something is missing here (maybe just a full stop?)

> * config/rs6000/rs6000.md (fegetroundsi): New pattern.
> (feclearexceptsi): New Pattern.
> (feraiseexceptsi): New Pattern.
> * doc/extend.texi: Add a new introductory paragraph about the
> new builtins.

Pet peeve: please don't break lines early, we have only 72 columns per
line and we have many long symbol names.  Trying to make many lines very
short only results in everything looking very irregular, which is harder
to read.

> * doc/md.texi: (fegetround@var{m}): Document new optab.
> (feclearexcept@var{m}): Document new optab.
> (feraiseexcept@var{m}): Document new optab.
> * optabs.def (fegetround_optab): New optab.
> (feclearexcept_optab): New optab.
> (feraiseexcept_optab): New optab.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New 
> test.
> * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New 
> test.
> * gcc.target/powerpc/builtin-fegetround.c: New test.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6860,6 +6860,117 @@
>[(set_attr "type" "fpload")
> (set_attr "length" "8")
> (set_attr "isa" "*,p8v,p8v")])
> +
> +;; int fegetround(void)
> +;;
> +;; This expansion for the C99 function only expands for compatible
> +;; target libcs. Because it needs to return one of FE_DOWNWARD,
> +;; FE_TONEAREST, FE_TOWARDZERO or FE_UPWARD with the values as defined
> +;; by the target libc, and since they are free to
> +;; choose the values and the expand needs to know then beforehand,
> +;; this expand only expands for target libcs that it can handle the
> +;; values is knows.
> +;; Because of these restriction, this only expands on the desired
> +;; case and fallback to a call to libc on any otherwise.
> +(define_expand "fegetroundsi"

(This needs some wordsmithing.)

> +;; int feclearexcept(int excepts)
> +;;
> +;; This expansion for the C99 function only works when EXCEPTS is a
> +;; constant known at compile time and specifies any one of
> +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags.
> +;; It doesn't handle values out of range, and always returns 0.

It FAILs the expansion if a parameter is bad?  Is this comment out of
date?

> +;; Note that FE_INVALID is unsupported because it maps to more than
> +;; one bit of the FPSCR register.

It could be implemented, now that you check for the libc used.  It is a
fixed part of the ABI :-)

> +;; The FE_* are defined in the targed libc, and since they are free to
> +;; choose the values and the expand needs to know then beforehand,

s/then/them/

> +;; this expand only expands for target libcs that it can handle the

(this expander)

> +;; values is knows.

s/is/it/

> +/* This testcase ensures that the builtins expand with the matching arguments
> + * or otherwise fallback gracefully to a function call, and don't ICE during
> + * compilation.
> + * "-fno-builtin" option is used to enable calls to libc implementation of 
> the
> + * gcc builtins tested when not using __builtin_ prefix. */

Don't use leading * in comments, btw.  This is a testcase so anything
goes, but FYI :-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c

> +  int i, rounding, expected;
> +  const int rm[] = {FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD};
> +  for (i = 0; i < sizeof(rm); i++)

That should be   sizeof rm / sizeof rm[0]   ?  It accesses out of bounds
as it is.

Maybe test more values?  At least 0, but also combinations of these FE_
bits, and maybe even FE_INVALID?

With such changes the rs6000 parts are okay for trunk.  Thanks!

I looked at the generic changes as well, and they all look fine to me.


Segher


[Bug rtl-optimization/103431] New: [12 Regression] wrong code with -O -fno-tree-bit-ccp -fno-tree-dominator-opts

2021-11-25 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103431

Bug ID: 103431
   Summary: [12 Regression] wrong code with -O -fno-tree-bit-ccp
-fno-tree-dominator-opts
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu

Created attachment 51874
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51874=edit
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc -O -fno-tree-bit-ccp -fno-tree-dominator-opts
testcase.c
$ ./a.out
Aborted

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64//bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r12-5528-20211125184355-g9488d242066-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r12-5528-20211125184355-g9488d242066-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.0 20211125 (experimental) (GCC)

[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument

2021-11-25 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418

--- Comment #5 from Steve Kargl  ---
On Thu, Nov 25, 2021 at 09:02:34PM +, anlauf at gcc dot gnu.org wrote:
> (In reply to kargl from comment #3)
> > (In reply to anlauf from comment #2)
> > > The nearly obvious fix:
> > > 
> > > diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
> > > index 837eb0912c0..3859e18c6c3 100644
> > > --- a/gcc/fortran/check.c
> > > +++ b/gcc/fortran/check.c
> > > @@ -1031,7 +1031,7 @@ variable_check (gfc_expr *e, int n, bool allow_proc)
> > > break;
> > > }
> > >  
> > > -  if (!ref)
> > > +  if (!ref && !pointer)
> > > {
> > >   gfc_error ("%qs argument of %qs intrinsic at %L cannot be "
> > >  "INTENT(IN)", gfc_current_intrinsic_arg[n]->name,
> > > 
> > > regresses for gfortran.dg/move_alloc_8.f90, thus needs additional
> > > investigation.
> > 
> > Did you try the patch posted in Fortran Discourse?
> 
> No.
> 
> I'm afraid I also missed it on the usual channels where patches for gcc
> are posted.
> 

As explained on FD, I don't report problems found be other 
people who post them in FD, stackoverflow, or c.l.f.  I
encourage those people to report the problems themselves.

That said, you found the right location to patch.  The
code looks convoluted to deal with CLASS, which messes
up an array with the pointer attribute.


diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 6ea6e136d4f..e96bcdb1b44 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -1031,7 +1031,7 @@ variable_check (gfc_expr *e, int n, bool allow_proc)
break;
}

-  if (!ref)
+  if (!ref && !(pointer && e->ref && e->ref->type == REF_ARRAY))
{
  gfc_error ("%qs argument of %qs intrinsic at %L cannot be "
 "INTENT(IN)", gfc_current_intrinsic_arg[n]->name,
@@ -1062,7 +1062,8 @@ variable_check (gfc_expr *e, int n, bool allow_proc)
 return true;

   gfc_error ("%qs argument of %qs intrinsic at %L must be a variable",
-gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic,
>where);
+gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic,
+>where);

   return false;
 }

[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument

2021-11-25 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418

--- Comment #4 from anlauf at gcc dot gnu.org ---
(In reply to kargl from comment #3)
> (In reply to anlauf from comment #2)
> > The nearly obvious fix:
> > 
> > diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
> > index 837eb0912c0..3859e18c6c3 100644
> > --- a/gcc/fortran/check.c
> > +++ b/gcc/fortran/check.c
> > @@ -1031,7 +1031,7 @@ variable_check (gfc_expr *e, int n, bool allow_proc)
> > break;
> > }
> >  
> > -  if (!ref)
> > +  if (!ref && !pointer)
> > {
> >   gfc_error ("%qs argument of %qs intrinsic at %L cannot be "
> >  "INTENT(IN)", gfc_current_intrinsic_arg[n]->name,
> > 
> > regresses for gfortran.dg/move_alloc_8.f90, thus needs additional
> > investigation.
> 
> Did you try the patch posted in Fortran Discourse?

No.

I'm afraid I also missed it on the usual channels where patches for gcc
are posted.

Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377

2021-11-25 Thread Mikael Morin

Le 25/11/2021 à 21:03, Harald Anlauf a écrit :

Hi Mikael,

Am 25.11.21 um 17:46 schrieb Mikael Morin:

Hello,

Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit :

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 5a5aca10ebe..837eb0912c0 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr
*shape,
 {
   gfc_constructor *c;
   bool test;
+  gfc_constructor_base b;

+  if (shape->expr_type == EXPR_ARRAY)
+    b = shape->value.constructor;
+  else if (shape->expr_type == EXPR_VARIABLE)
+    b = shape->symtree->n.sym->value->value.constructor;


This misses a check that shape->symtree->n.sym->value is an array, so
that it makes sense to access its constructor.


there are checks further above for the cases
   shape->expr_type == EXPR_ARRAY
and for
   shape->expr_type == EXPR_VARIABLE
which look at the elements of array shape to see if they are
non-negative.

Only in those cases where the full "if ()'s" pass we set
shape_is_const = true; and proceed.  The purpose of the auxiliary
bool shape_is_const is to avoid repeating the lengthy if's again.
Only then the above cited code segment should get executed.

For shape->expr_type == EXPR_ARRAY there is really no change in logic.
For shape->expr_type == EXPR_VARIABLE the above snipped is now executed,
but then we already had

   else if (shape->expr_type == EXPR_VARIABLE && shape->ref
    && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1
    && shape->ref->u.ar.as
    && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT
    && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER
    && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT
    && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER
    && shape->symtree->n.sym->attr.flavor == FL_PARAMETER
    && shape->symtree->n.sym->value)

In which situations do I miss anything new?


Yes, I agree with all of this.
My comment wasn’t about a check on shape->expr_type, but on 
shape->value->expr_type if shape->expr_type is a (parameter) variable.



Actually, this only supports the case where the parameter value is
defined by an array; but it could be an intrinsic call, a sum of
parameters, a reference to an other parameter, etc.


E.g. the following (still) does get rejected:

   print *, reshape([1,2,3,4,5], a+1)
   print *, reshape([1,2,3,4,5], a+a)
   print *, reshape([1,2,3,4,5], 2*a)
   print *, reshape([1,2,3,4,5], [3,3])
   print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2))

and has been rejected before.




The usual way to handle this is to call gfc_reduce_init_expr which (pray
for it) will make an array out of whatever the shape expression is.


Can you give an example where it fails?

I think the current code would almost certainly fail, too.


Probably, I was just trying to avoid followup bugs. ;-)

I have checked the following:

  integer, parameter :: a(2) = [1,1]
  integer, parameter :: b(2) = a + 1
  print *, reshape([1,2,3,4], b)
end

and it doesn’t fail as I thought it would.
So yes, I was wrong; b has been expanded to an array before.

Can you add an assert or a comment saying that the parameter value has 
been expanded to a constant array?


Ok with that change.




[Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-25 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #13 from Richard Earnshaw  ---
Also, note that the comment in gimple-fold.c prior to this change read:

  /* If we can perform the copy efficiently with first doing all loads
 and then all stores inline it that way.  Currently efficiently
 means that we can load all the memory into a single integer
 register which is what MOVE_MAX gives us.  */

Which would imply that the AArch64 definition of MOVE_MAX is the correct one.

[Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-25 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #12 from Richard Earnshaw  ---
(In reply to Jakub Jelinek from comment #10)
> Alternatively, couldn't we check next to that new
>  && have_insn_for (SET, mode)
> also that
>  && known_le (GET_MODE_SIZE (mode), MOVE_MAX)
> ?

No, that would limit us to MOVE_MAX again, so what would be the point in having
a more relaxed test earlier.

I do wonder if MOVE_MAX * MOVE_RATIO should be replaced with the MOVE_BY_PIECES
infrastructure, I just haven't had time to cook up a patch to try that, though.

Re: [PATCH 4/4] libgcc: Use _dl_find_eh_frame in _Unwind_Find_FDE

2021-11-25 Thread Florian Weimer via Gcc-patches
* Jakub Jelinek:

>> +/* Fallback declaration for old glibc headers.  DL_FIND_EH_FRAME_DBASE is 
>> used
>> +   as a proxy to determine if  declares _dl_find_eh_frame.  */
>> +#if defined __GLIBC__ && !defined DL_FIND_EH_FRAME_DBASE
>> +#if NEED_DBASE_MEMBER
>> +void *_dl_find_eh_frame (void *__pc, void **__dbase) __attribute__ ((weak));
>> +#else
>> +void *_dl_find_eh_frame (void *__pc) __attribute__ ((weak));
>> +#endif
>> +#define USE_DL_FIND_EH_FRAME 1
>> +#define DL_FIND_EH_FRAME_CONDITION (_dl_find_eh_frame != NULL)
>> +#endif
>
> I'd prefer not to do this.  If we find glibc with the support in the
> headers, let's use it, otherwise let's keep using what we were doing before.

I've included a simplified version below, based on the _dl_find_object
patch for glibc.

This is a bit difficult to test, but I ran a full toolchain bootstrap
with GCC + glibc on all glibc-supported architectures (except Hurd and
one m68k variant; they do not presnetly build, see Joseph's testers).

I also tested this by copying the respective GCC-built libgcc_s into a
glibc build tree for run-time testing on i686-linux-gnu and
x86_64-linux-gnu.  There weren't any issues.  There are a buch of
unwinder tests in glibc, giving at least some coverage.

Thanks,
Florian

Subject: libgcc: Use _dl_find_object in _Unwind_Find_FDE

libgcc/ChangeLog:

* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Call _dl_find_object
if available.

---
 libgcc/unwind-dw2-fde-dip.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index fbb0fbdebb9..b837d8e4904 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -504,6 +504,24 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   if (ret != NULL)
 return ret;
 
+  /* Use DLFO_STRUCT_HAS_EH_DBASE as a proxy for the existence of a glibc-style
+ _dl_find_object function.  */
+#ifdef DLFO_STRUCT_HAS_EH_DBASE
+  {
+struct dl_find_object dlfo;
+if (_dl_find_object (pc, ) == 0)
+  return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame,
+# if DLFO_STRUCT_HAS_EH_DBASE
+   (_Unwind_Ptr) dlfo.dlfo_eh_dbase,
+# else
+   NULL,
+# endif
+   bases);
+else
+  return NULL;
+}
+#endif /* DLFO_STRUCT_HAS_EH_DBASE */
+
   data.pc = (_Unwind_Ptr) pc;
 #if NEED_DBASE_MEMBER
   data.dbase = NULL;



[PATCH v2] elf: Add _dl_find_object function

2021-11-25 Thread Florian Weimer via Gcc-patches
I have reword the previous patch to make the interface more generally
useful.  Since there are now four words in the core arrays, I did away
with the separate base address array.  (We can bring it back in the
future if necessary.)  I fixed a bug in the handling of proxy map (by
not copying proxy maps during the dlopen update).  The placement of the
function is also different, as explained in the commit message.

The performance seems unchanged.

I haven't included the obvious future performance enhancements in this
patch, and also did not update to Arm's __gnu_Unwind_Find_exidx to use
the new interface.  I think this work can be done in follow-up patches.

Thanks,
Florian

Subject: elf: Add _dl_find_object function

It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).

_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed.  It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).

It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.

The implementation uses software transactional memory, as suggested
by Torvald Riegel.  Two copies of the supporting data structures are
used, also achieving full async-signal-safety.

---
 NEWS   |   4 +
 bits/dl_find_object.h  |  32 +
 dlfcn/Makefile |   2 +-
 dlfcn/dlfcn.h  |  22 +
 elf/Makefile   |  47 +-
 elf/Versions   |   3 +
 elf/dl-close.c |   4 +
 elf/dl-find_object.c   | 841 +
 elf/dl-find_object.h   | 115 +++
 elf/dl-libc_freeres.c  |   2 +
 elf/dl-open.c  |   5 +
 elf/dl-support.c   |   3 +
 elf/libc-dl_find_object.c  |  26 +
 elf/rtld.c |  11 +
 elf/rtld_static_init.c |   1 +
 elf/tst-dl_find_object-mod1.c  |  10 +
 elf/tst-dl_find_object-mod2.c  |  15 +
 elf/tst-dl_find_object-mod3.c  |  10 +
 elf/tst-dl_find_object-mod4.c  |  10 +
 elf/tst-dl_find_object-mod5.c  |  11 +
 elf/tst-dl_find_object-mod6.c  |  11 +
 elf/tst-dl_find_object-mod7.c  |  10 +
 elf/tst-dl_find_object-mod8.c  |  10 +
 elf/tst-dl_find_object-mod9.c  |  10 +
 elf/tst-dl_find_object-static.c|  22 +
 elf/tst-dl_find_object-threads.c   | 275 +++
 elf/tst-dl_find_object.c   | 240 ++
 include/atomic_wide_counter.h  |  14 +
 include/bits/dl_find_object.h  |   1 +
 include/dlfcn.h|   2 +
 include/link.h |   3 +
 manual/Makefile|   2 +-
 manual/dynlink.texi| 137 
 manual/libdl.texi  |  10 -
 manual/probes.texi |   2 +-
 manual/threads.texi|   2 +-
 sysdeps/arm/bits/dl_find_object.h  |  25 +
 sysdeps/generic/ldsodefs.h |   5 +
 sysdeps/mach/hurd/i386/libc.abilist|   1 +
 sysdeps/nios2/bits/dl_find_object.h|  25 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/microblaze/be/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/microblaze/le/libc.abilist |   1 +
 

[Bug middle-end/103406] gcc -O0 behaves differently on "DBL_MAX related operations" than gcc -O1 and above

2021-11-25 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103406

--- Comment #14 from joseph at codesourcery dot com  ---
There is no reasonable definition of how operands of binary + map to 
particular operands of a particular instruction and so no -f or -m option 
could sensibly be defined for that.  When the result is a NaN, there is no 
requirement at all on what (quiet) NaN it is (beyond a preference for 
preservation of the payload of a NaN operand if there is at least one NaN 
operand).

[Bug tree-optimization/99520] Failure to detect bswap pattern

2021-11-25 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99520

Roger Sayle  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 CC||roger at nextmovesoftware dot 
com
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Roger Sayle  ---
This PR is now fixed on mainline.  Thanks to Jakub (my apologies if I'd seen
comment #2 I wouldn't of accidentally broken things; aka PR
tree-optimization/103376, fortunately Jakub was able to quickly correct my
oversight).

[Bug tree-optimization/98953] Failure to optimize two reads from adjacent addresses into one

2021-11-25 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98953

Roger Sayle  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|roger at nextmovesoftware dot com  |unassigned at gcc dot 
gnu.org

--- Comment #4 from Roger Sayle  ---
The MULT_EXPR and PLUS_EXPR aspects of this PR are now resolved (i.e. the case
in comment #1), but unfortunately the abs-based indexing used in the original
report still causes problems.  The bswap pass doesn't yet handle memory
accesses of the form read[abs]/read[abs+1] (but does handle read[0]/read[1]).

[committed] libstdc++: Do not use memset in constexpr calls to ranges::fill_n [PR101608]

2021-11-25 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.


libstdc++-v3/ChangeLog:

PR libstdc++/101608
* include/bits/ranges_algobase.h (__fill_n_fn): Check for
constant evaluation before using memset.
* testsuite/25_algorithms/fill_n/constrained.cc: Check
byte-sized values as well.
---
 libstdc++-v3/include/bits/ranges_algobase.h   | 28 ---
 .../25_algorithms/fill_n/constrained.cc   |  6 ++--
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index c8c4d032983..9929e5e828b 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -527,17 +527,25 @@ namespace ranges
if (__n <= 0)
  return __first;
 
-   // TODO: Generalize this optimization to contiguous iterators.
-   if constexpr (is_pointer_v<_Out>
- // Note that __is_byte already implies !is_volatile.
- && __is_byte>::__value
- && integral<_Tp>)
- {
-   __builtin_memset(__first, static_cast(__value), __n);
-   return __first + __n;
- }
-   else if constexpr (is_scalar_v<_Tp>)
+   if constexpr (is_scalar_v<_Tp>)
  {
+   // TODO: Generalize this optimization to contiguous iterators.
+   if constexpr (is_pointer_v<_Out>
+ // Note that __is_byte already implies !is_volatile.
+ && __is_byte>::__value
+ && integral<_Tp>)
+ {
+#ifdef __cpp_lib_is_constant_evaluated
+   if (!std::is_constant_evaluated())
+#endif
+ {
+   __builtin_memset(__first,
+static_cast(__value),
+__n);
+   return __first + __n;
+ }
+ }
+
const auto __tmp = __value;
for (; __n > 0; --__n, (void)++__first)
  *__first = __tmp;
diff --git a/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc
index 6a015d34a89..1d1e1c104d4 100644
--- a/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc
@@ -73,11 +73,12 @@ test01()
 }
 }
 
+template
 constexpr bool
 test02()
 {
   bool ok = true;
-  int x[6] = { 1, 2, 3, 4, 5, 6 };
+  T x[6] = { 1, 2, 3, 4, 5, 6 };
   const int y[6] = { 1, 2, 3, 4, 5, 6 };
   const int z[6] = { 17, 17, 17, 4, 5, 6 };
 
@@ -94,5 +95,6 @@ int
 main()
 {
   test01();
-  static_assert(test02());
+  static_assert(test02());
+  static_assert(test02()); // PR libstdc++/101608
 }
-- 
2.31.1



[Bug libstdc++/101608] ranges::fill/fill_n missing std::is_constant_evaluated() condition for __builtin_memset

2021-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101608

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:82c3657dd74896b39937bb0a2aaeba9b8ca105fd

commit r12-5530-g82c3657dd74896b39937bb0a2aaeba9b8ca105fd
Author: Jonathan Wakely 
Date:   Wed Nov 24 13:17:54 2021 +

libstdc++: Do not use memset in constexpr calls to ranges::fill_n
[PR101608]

libstdc++-v3/ChangeLog:

PR libstdc++/101608
* include/bits/ranges_algobase.h (__fill_n_fn): Check for
constant evaluation before using memset.
* testsuite/25_algorithms/fill_n/constrained.cc: Check
byte-sized values as well.

Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377

2021-11-25 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 25.11.21 um 17:46 schrieb Mikael Morin:

Hello,

Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit :

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 5a5aca10ebe..837eb0912c0 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr
*shape,
 {
   gfc_constructor *c;
   bool test;
+  gfc_constructor_base b;

+  if (shape->expr_type == EXPR_ARRAY)
+    b = shape->value.constructor;
+  else if (shape->expr_type == EXPR_VARIABLE)
+    b = shape->symtree->n.sym->value->value.constructor;


This misses a check that shape->symtree->n.sym->value is an array, so
that it makes sense to access its constructor.


there are checks further above for the cases
  shape->expr_type == EXPR_ARRAY
and for
  shape->expr_type == EXPR_VARIABLE
which look at the elements of array shape to see if they are
non-negative.

Only in those cases where the full "if ()'s" pass we set
shape_is_const = true; and proceed.  The purpose of the auxiliary
bool shape_is_const is to avoid repeating the lengthy if's again.
Only then the above cited code segment should get executed.

For shape->expr_type == EXPR_ARRAY there is really no change in logic.
For shape->expr_type == EXPR_VARIABLE the above snipped is now executed,
but then we already had

  else if (shape->expr_type == EXPR_VARIABLE && shape->ref
   && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1
   && shape->ref->u.ar.as
   && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT
   && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER
   && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT
   && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER
   && shape->symtree->n.sym->attr.flavor == FL_PARAMETER
   && shape->symtree->n.sym->value)

In which situations do I miss anything new?


Actually, this only supports the case where the parameter value is
defined by an array; but it could be an intrinsic call, a sum of
parameters, a reference to an other parameter, etc.


E.g. the following (still) does get rejected:

  print *, reshape([1,2,3,4,5], a+1)
  print *, reshape([1,2,3,4,5], a+a)
  print *, reshape([1,2,3,4,5], 2*a)
  print *, reshape([1,2,3,4,5], [3,3])
  print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2))

and has been rejected before.


The usual way to handle this is to call gfc_reduce_init_expr which (pray
for it) will make an array out of whatever the shape expression is.


Can you give an example where it fails?

I think the current code would almost certainly fail, too.


The rest looks good.
In the test, can you add a comment telling what it is testing?
Something like: "This tests that constant shape expressions passed to
the reshape intrinsic are properly simplified before being used to
diagnose invalid values"


Can do.


We also used to put a comment mentioning the person who submitted the
test, but not everybody seems to do it these days.


Can do.


Mikael



Harald



[Bug tree-optimization/103345] missed optimization: add/xor individual bytes to form a word

2021-11-25 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345

Roger Sayle  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Roger Sayle  ---
This PR should now be fixed (missed optimization implemented) on mainline.

[Bug middle-end/103406] gcc -O0 behaves differently on "DBL_MAX related operations" than gcc -O1 and above

2021-11-25 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103406

Roger Sayle  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|roger at nextmovesoftware dot com  |unassigned at gcc dot 
gnu.org
Summary|[12 Regression] gcc -O0 |gcc -O0 behaves differently
   |behaves differently on  |on "DBL_MAX related
   |"DBL_MAX related|operations" than gcc -O1
   |operations" than gcc -O1|and above
   |and above   |
 Target||x86_64

--- Comment #13 from Roger Sayle  ---
The Inf - Inf => 0.0 regression should now be fixed on mainline.

Hmm.  As hinted by Richard Beiner's investigation, the underlying problem is
even more pervasive.  It turns out that on x86/IA64 chips, floating point
addition is not commutative, i.e. x+y is not the same as y+x, as demonstrated
by the test program below:

#include 

const double pn = __builtin_nan("");
const double mn = -__builtin_nan("");

__attribute__ ((noinline, noclone))
double plus(double x, double y)
{
  return x + y;
}

int main()
{
  printf("%lf\n",plus(pn,mn));
  printf("%lf\n",plus(mn,pn));
  return 0;
}

Output:
nan
-nan

Unfortunately, GCC assumes almost everywhere the FP addition is commutative
and (as per comments #8 and #9) associative with negation/minus.  This appears
to be target property, c.f. libgcc's _FP_CHOOSENAN, but could in theory be
resolved by a -fstrict-math mode (that implies -ftrapping-math) that disables
commutativity (swapping of operands) throughout the compiler, including
reload/fold-const etc., on affected Intel-like targets.
Perhaps this PR is a duplicate now that the regression has been fixed?

[Bug c++/56119] Allows static member definition of template class in namespace not enclosing this class

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56119

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=103426
 CC||fchelnokov at gmail dot com

--- Comment #3 from Andrew Pinski  ---
*** Bug 103426 has been marked as a duplicate of this bug. ***

[Bug c++/103426] Acceptance of invalid template specialization in a namespace not enclosing the specialized template

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103426

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=56119
 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
This is a dup of bug 56119.

*** This bug has been marked as a duplicate of bug 56119 ***

[Bug tree-optimization/103427] Alignment of C++ references and 'this' pointer not used by optimizer

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103427

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2021-11-25
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement

--- Comment #11 from Andrew Pinski  ---
Confirmed.
I had thought there was another bug about this but I can't find it.

[Bug tree-optimization/103332] Spurious -Wstringop-overflow warnings in libstdc++ tests

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103332

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2021-11-25
 Status|UNCONFIRMED |NEW

--- Comment #4 from Andrew Pinski  ---
.

[Bug target/102117] s390: Inefficient code for 64x64=128 signed multiply for <= z13

2021-11-25 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102117

Roger Sayle  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |12.0

--- Comment #4 from Roger Sayle  ---
This should now be fixed on mainline.

[Bug tree-optimization/102958] std::u8string suboptimal compared to std::string, triggers warnings

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102958

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2021-11-25
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski  ---
Confirmed, interesting we don't detect this as strlen:
   [local count: 8687547547]:
  # __i_155 = PHI <__i_46(3), 0(2)>
  __i_46 = __i_155 + 1;
  _48 = MEM[(const char_type &)"123456789" + __i_46 * 1];
  if (_48 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

I thought there was code to do that dection now?

[Bug middle-end/103406] [12 Regression] gcc -O0 behaves differently on "DBL_MAX related operations" than gcc -O1 and above

2021-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103406

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:6ea5fb3cc7f3cc9b731d72183c66c23543876f5a

commit r12-5529-g6ea5fb3cc7f3cc9b731d72183c66c23543876f5a
Author: Roger Sayle 
Date:   Thu Nov 25 19:02:06 2021 +

PR middle-end/103406: Check for Inf before simplifying x-x.

This is a simple one line fix to the regression PR middle-end/103406,
where x - x is being folded to 0.0 even when x is +Inf or -Inf.
In GCC 11 and previously, we'd check whether the type honored NaNs
(which implicitly covered the case where the type honors infinities),
but my patch to test whether the operand could potentially be NaN
failed to also check whether the operand could potentially be Inf.

2021-11-25  Roger Sayle  

gcc/ChangeLog
PR middle-end/103406
* match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p.

gcc/testsuite/ChangeLog
PR middle-end/103406
* gcc.dg/pr103406.c: New test case.

[Bug tree-optimization/103429] Optimization of Auto-generated condition chain is not giving good lookup tables.

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103429

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug c++/102454] coroutines: ICE in gimplify_var_or_parm_decl, at gimplify.c:2958

2021-11-25 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102454

--- Comment #7 from Iain Sandoe  ---
I was leaving it to check if we needed to back port to 10.x as well.

[Bug c++/102213] Incorrect executable produced from valid input code with virtual consteval

2021-11-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102213

--- Comment #2 from Andrew Pinski  ---
Note GCC 10 did a sorry message:
sorry, unimplemented: 'virtual' 'consteval'

  1   2   3   >