[PATCH] Restore can_be_invalidated_p semantics to before refactoring
This restores the semantics of can_be_invalidated_p to the original semantics of the function this was split out from tree-ssa-uninit.c. The current semantics only ever look at the first predicate which cannot be correct. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. 2021-11-26 Richard Biener * gimple-predicate-analysis.cc (can_be_invalidated_p): Restore semantics to the one before the split from tree-ssa-uninit.c. --- gcc/gimple-predicate-analysis.cc | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc index 6dde0203841..da6adc9a3e2 100644 --- a/gcc/gimple-predicate-analysis.cc +++ b/gcc/gimple-predicate-analysis.cc @@ -1199,14 +1199,16 @@ can_be_invalidated_p (const pred_chain_union , const pred_chain ) for (unsigned i = 0; i < preds.length (); ++i) { const pred_chain = preds[i]; - for (unsigned j = 0; j < chain.length (); ++j) + unsigned j; + for (j = 0; j < chain.length (); ++j) if (can_be_invalidated_p (chain[j], guard)) - return true; + break; /* If we were unable to invalidate any predicate in C, then there is a viable path from entry to the PHI where the PHI takes an interesting value and continues to a use of the PHI. */ - return false; + if (j == chain.length ()) + return false; } return true; } -- 2.31.1
Re: [PATCH] Loop unswitching: support gswitch statements.
On Thu, Nov 25, 2021 at 11:38 AM Aldy Hernandez wrote: > > On Wed, Nov 24, 2021 at 9:00 AM Richard Biener > wrote: > > > > On Tue, Nov 23, 2021 at 5:36 PM Martin Liška wrote: > > > > > > On 11/23/21 16:20, Martin Liška wrote: > > > > Sure, so for e.g. case 1 ... 5 we would need to create a new > > > > unswitch_predicate > > > > with 1 <= index && index <= 5 tree predicate (and the corresponding > > > > irange range). > > > > Later once we unswitch on it, we should use a special unreachable_flag > > > > that will > > > > be used for marking of dead edges (similarly how we fold gconds to > > > > boolean_{false/true}_node. > > > > Does it make sense? > > > > > > I have thought about it more and it's not enough. What we really want is > > > having a irange > > > for *each edge* (2 for gconds and multiple for gswitchs). Once we select > > > a unswitch_predicate, > > > then we need to fold_range in true/false loop all these iranges. Doing > > > that we can handle situations like: > > > > > > if (index < 1) > > > do_something1 > > > > > > if (index > 2) > > > do_something2 > > > > > > switch (index) > > > case 1 ... 2: > > > do_something; > > > ... > > > > > > as seen the once we unswitch on 'index < 1' and 'index > 2', then the > > > first case will be taken in the false_edge > > > of 'index > 2' loop unswitching. > > > > Hmm. I'm not sure it needs to be this complicated. We're basically > > evaluating ranges/predicates based > > on a fixed set of versioning predicates. Your implementation created > > "predicates" for the to be simplified > > conditions but in the end we like to evaluate the actual stmt to > > figure the taken/not taken edges. IIRC > > elsewhere Andrew showed a snipped on how to evaluate a stmt with a > > given range - not sure if that > > was useful enough. So what I think would be nice if we could somehow > > use rangers path query > > without an actual CFG. So we virtuall have > > > > if (versioning-predicate1) > > if (versioning-predicate2) > >; > >else > > for (;;) // out current loop > > { > > ... > > if (condition) > > ; > > ... > > switch (var) > > { > > ... > > } > > } > > > > and versioning-predicate1 and versioning-predicate2 are not in the IL. > > What we'd like > > to do is seed the path query with a "virtual" path through the two > > predicates to the > > entry of the loop and compute_ranges based on those. Then we like to > > use range_of_stmt on 'if (condition)' and 'switch (var)' to determine > > not taken edges. > > Huh, that's an interesting idea. We could definitely adapt > path_range_query to work with an artificial sequence of blocks, but it > would need some surgery. Off the top of my head: > > a) The phi handling code looks for specific edges in the path (both > for intra path ranges and for relations inherent in PHIs). > b) The exported ranges between blocks in the path, probably needs some > massaging. > c) compute_outgoing_relations would need some work as you mention below... > > > Looking somewhat at the sources it seems like we "simply" need to do what > > compute_outgoing_relations does - unfortunately the code lacks comments > > so I have no idea what jt_fur_source src (...).register_outgoing_edges does > > ... > > fur_source is an abstraction for operands to the folding mechanism: > > // Source of all operands for fold_using_range and gori_compute. > // It abstracts out the source of an operand so it can come from a stmt or > // and edge or anywhere a derived class of fur_source wants. > // The default simply picks up ranges from the current range_query. > > class fur_source > { > } > > When passed to register_outgoing_edges, it registers outgoing > relations out of a conditional. I pass it the known outgoing edge out > of the conditional, so only the relational on that edge is recorded. > I have overloaded fur_source into a path specific jt_fur_source that > uses a path_oracle to register relations as they would occur along a > path. Once register_outgoing_edges is called on each outgoing edge > between blocks in a path, the relations will have been set, and can be > seen by the range_of_stmt: > > path_range_query::range_of_stmt (irange , gimple *stmt, tree) > { > ... > // If resolving unknowns, fold the statement making use of any > // relations along the path. > if (m_resolve) > { > fold_using_range f; > jt_fur_source src (stmt, this, _ranger->gori (), m_path); > if (!f.fold_stmt (r, stmt, src)) > r.set_varying (type); > } > ... > } > > register_outgoing_edges would probably have to be adjusted for your > CFGless paths, and maybe the path_oracle (Andrew??). So conceptually we'd attach extra predicates to an edge (a single one, and even the "entry" edge of the path in this specific case). That is, instead of explicit \ if (p1) \
Re: [PATCH take 3] ivopts: Improve code generated for very simple loops.
On Thu, Nov 25, 2021 at 7:17 PM Roger Sayle wrote: > > > On Tue, Nov 23, 2021 at 12:46PM Richard Biener < richard.guent...@gmail.com> > wrote: > > On Thu, Nov 18, 2021 at 4:18 PM Roger Sayle > > wrote: > > > > The patch doesn't add any testcase. > > > > > > The three new attached tests check that the critical invariants have a > > > simpler form, and hopefully shouldn't be affected by whether the > > > optimizer and/or backend costs actually decide to perform this iv > > > substitution > > or not. > > > > The testcases might depend on lp64 though, did you test them with -m32? > > IMHO it's fine to require lp64 here. > > Great catch. You're right that when the loop index has the same precision as > the > target's pointer, that fold is (already) able to simplify the ((EXPR)-1)+1, > so that with > -m32 my new tests ivopts-[567].c fail. I've added "require lp64" to those > tests, but > I've also added two more tests, using char and unsigned char for the loop > expression, > which are optimized on both ilp32 and lp64. > > For example, with -O2 -m32, we see the following improvements in ivopts-8.c: > diff ivopts-8.old.s ivopts-8.new.s > 14,16c14,15 > < subl$1, %ecx > < movzbl %cl, %ecx > < leal4(%eax,%ecx,4), %ecx > --- > > movsbl %cl, %ecx > > leal(%eax,%ecx,4), %ecx > > This might also explain why GCC currently generates sub-optimal code. Back > when > ivopts was written, most folks were on i686, so the generated code was > optimal. > But with the transition to x86_64, the code is correct, just slightly less > efficient. > > > I'm a bit unsure about adding this special-casing in cand_value_at in > > general - it > > does seem that we're doing sth wrong elsewhere - either by not simplifying > > even > > though enough knowledge is there or by throwing away knowledge earlier > > (during niter analysis?). > > I agree this approach is a bit ugly. Conceptually, an alternative might be > to avoid > throwing away knowledge earlier, during niter analysis, by adding an extra > tree field > to the tree_niter_desc structure, so that it returns both niter0 (the > iteration count > at the top of the loop) and niter1 (the iteration count at the bottom of the > loop), > so that later passes (cand_value_at) can use the tree that's relevant. Yes, I also thought of this but I wasn't sure we always have that. I also wouldn't think of it as too ugly, but well ... it would definitely be useful elsewhere. Btw, it's generally the number of executions of the latch vs. the number of executions of the header - currently what niter analysis computes is the number of executions of the latch. There are loops where the number of iterations of the header is not representable in the IV. > Alas, this too is > ugly, and inefficient as we're creating/folding trees that may never be > used/useful. > A compromise might be to add an enum field describing how the niter was > calculated to tree_niter_desc, and this can be inspected/used by > cand_value_at. > The current patch figures this out by examining the other fields already in > tree_niter_desc. > > > > Anyway, the patch does look quite safe - can you show some statistics in how > > many time there's extra simplification this way during say bootstrap? > > Certainly. During stage2 and stage3 of a bootstrap on x86_64-pc-linux-gnu, > cand_value_at is called 500657 times. The majority of calls, > 447607 (89.4%), request the value at the end of the loop (after_adjust), > while 53050 (10.6%) request the value at the start of the loop. > 102437 calls (20.5%) are optimized by clause 1 [0..N loops] > 27939 calls (5.6%) are optimized by clause 2 [beg..end loops] Thanks for the detailed data. > Looking for opportunities to improve things further, I see that > 319608 calls (63.8%) have a LT_EXPR exit test. > 160965 calls (32.2%) have a NE_EXPR exit test. > 20084 calls (4.0%) have a GT_EXPR exit test. > so handling descending loops wouldn’t be a big win. > I'll investigate whether (constant) step sizes other than 1 are > (i) sufficiently common and (ii) benefit from improved folding. > > > This revised patch has been test on x86_64-pc-linux-gnu with a > make bootstrap and make -k check, both with and without > --target-board=unix{-m32}, with no new failures. > Ok for mainline? OK. Thanks, Richard. > 2021-11-25 Roger Sayle > > gcc/ChangeLog > * tree-ssa-loop-ivopts.c (cand_value_at): Take a class > tree_niter_desc* argument instead of just a tree for NITER. > If we require the iv candidate value at the end of the final > loop iteration, try using the original loop bound as the > NITER for sufficiently simple loops. > (may_eliminate_iv): Update (only) call to cand_value_at. > > gcc/testsuite > * gcc.dg/wrapped-binop-simplify.c: Update expected test result. > * gcc.dg/tree-ssa/ivopts-5.c: New test case. > * gcc.dg/tree-ssa/ivopts-6.c: New test case. >
[Bug fortran/103434] New: Pointer subobject does not show to correct memory location
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103434 Bug ID: 103434 Summary: Pointer subobject does not show to correct memory location Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: baradi09 at gmail dot com Target Milestone: --- Based on the discussion on FD (https://fortran-lang.discourse.group/t/is-the-section-of-a-pointer-to-an-array-a-valid-pointer/2331), I'd assume, that the following code is standard conforming. However, the result with gfortran seems to be incorrect. *** Code: module test implicit none type :: pointer_wrapper real, pointer :: ptr(:) => null() end type pointer_wrapper contains subroutine store_pointer(wrapper, ptr) type(pointer_wrapper), intent(out) :: wrapper real, pointer, intent(in) :: ptr(:) wrapper%ptr => ptr end subroutine store_pointer subroutine use_pointer(wrapper) type(pointer_wrapper), intent(inout) :: wrapper wrapper%ptr(:) = wrapper%ptr + 1.0 end subroutine use_pointer end module test program testprog use test implicit none real, allocatable, target :: data(:,:) real, pointer :: ptr(:,:) type(pointer_wrapper) :: wrapper integer :: ii allocate(data(4, 2)) ptr => data(:,:) data(:,:) = 0.0 do ii = 1, size(data, dim=2) print *, "#", ii print *, "BEFORE ", ii, maxval(ptr(:,ii)) call store_pointer(wrapper, ptr(:,ii)) print *, "BETWEEN", ii, maxval(ptr(:,ii)) call use_pointer(wrapper) print *, "AFTER ", ii, maxval(ptr(:,ii)) end do end program testprog *** Output: # 1 BEFORE1 0. BETWEEN 1 0. AFTER 1 1. # 2 BEFORE2 1. BETWEEN 2 1. AFTER 2 1. *** Expected output: # 1 BEFORE1 0. BETWEEN 1 0. AFTER 1 1. # 2 BEFORE2 0. BETWEEN 2 0. AFTER 2 1. It seems, as if store_pointer would point to a memory location larger as it should be, so that also data outside of the actual stride is modified. Intel and NAG deliver the expected output.
[Bug ipa/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug middle-end/103431] [12 Regression] wrong code with -O -fno-tree-bit-ccp -fno-tree-dominator-opts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103431 Richard Biener changed: What|Removed |Added Priority|P3 |P1 Keywords||needs-bisection Target Milestone|--- |12.0
[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271 --- Comment #7 from rguenther at suse dot de --- On Fri, 26 Nov 2021, wilson at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271 > > Jim Wilson changed: > >What|Removed |Added > > CC||wilson at gcc dot gnu.org > > --- Comment #5 from Jim Wilson --- > SiFive doesn't support -mno-strict-align so I've never tested it. I doubt > that > it works correctly, i.e. I doubt that it optimizes as intended. I've > mentioned > this to other RVI members, but there hasn't been anyone other than SiFive > actively working on upstream gcc so I don't think anyone ever looked at it. > It > shouldn't give an ICE though. > > Looking at this, it appears to be another "if only we had a movti pattern" > issue. > > In expand_DEFERRED_INIT in internal-fn.c, in the reg_lhs == TRUE case, there > is > a test > && have_insn_for (SET, var_mode)) > which fails because var_mode is TImode and we don't have a movti pattern. The > code calls build_zero_cst which returns a constructor with an array type. We > then call expand_assignment which gets confused as it doesn't know the size of > the array it is copying. That seems to be the bug - in this path we shouldn't ever create an entity with VLA size since we do know the actual size. But it all is a bit awkward.
[Bug target/102768] [feature request] Add compiler support for aarch64 shadow call stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 --- Comment #6 from ashimida --- RFC,v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585496.html
[PATCH] [RFC, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack
Shadow Call Stack can be used to protect the return address of a function at runtime, and clang already supports this feature[1]. To enable SCS in user mode, in addition to compiler, other support is also required (as discussed in [2]). This patch only adds basic support for SCS from the compiler side, and provides convenience for users to enable SCS. For linux kernel, only the support of the compiler is required. [1] https://clang.llvm.org/docs/ShadowCallStack.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 Signed-off-by: Dan Li gcc/c-family/ChangeLog: * c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute): New. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled): New decl. * config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled): New. (aarch64_expand_prologue): Push x30 onto SCS before it's pushed onto stack. (aarch64_expand_epilogue): Pop x30 frome SCS. * config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK): New macro. (TARGET_CHECK_SCS_RESERVED_REGISTER): Likewise. * config/aarch64/aarch64.md (scs_push): New template. (scs_pop): Likewise. * defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro. * doc/extend.texi: Document -fsanitize=shadow-call-stack. * doc/invoke.texi: Document attribute. * flag-types.h (enum sanitize_code):Add SANITIZE_SHADOW_CALL_STACK. * opts-global.c (handle_common_deferred_options): Add SCS compile option check. * opts.c (finish_options): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shadow_call_stack_1.c: New test. * gcc.target/aarch64/shadow_call_stack_2.c: New test. * gcc.target/aarch64/shadow_call_stack_3.c: New test. * gcc.target/aarch64/shadow_call_stack_4.c: New test. --- gcc/c-family/c-attribs.c | 21 + gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64.c | 27 +++ gcc/config/aarch64/aarch64.h | 11 + gcc/config/aarch64/aarch64.md | 18 gcc/defaults.h| 4 ++ gcc/doc/extend.texi | 7 +++ gcc/doc/invoke.texi | 29 gcc/flag-types.h | 2 + gcc/opts-global.c | 6 +++ gcc/opts.c| 12 + .../gcc.target/aarch64/shadow_call_stack_1.c | 6 +++ .../gcc.target/aarch64/shadow_call_stack_2.c | 6 +++ .../gcc.target/aarch64/shadow_call_stack_3.c | 45 +++ .../gcc.target/aarch64/shadow_call_stack_4.c | 18 15 files changed, 213 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index 007b928c54b..9b3a35c06bf 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, bool *); static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *); static tree handle_no_sanitize_address_attribute (tree *, tree, tree, int, bool *); +static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree, + tree, int, bool *); static tree handle_no_sanitize_thread_attribute (tree *, tree, tree, int, bool *); static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree, @@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] = handle_no_sanitize_attribute, NULL }, { "no_sanitize_address",0, 0, true, false, false, false, handle_no_sanitize_address_attribute, NULL }, + { "no_sanitize_shadow_call_stack", + 0, 0, true, false, false, false, + handle_no_sanitize_shadow_call_stack_attribute, + NULL }, { "no_sanitize_thread", 0, 0, true, false, false, false, handle_no_sanitize_thread_attribute, NULL }, { "no_sanitize_undefined", 0, 0, true, false, false, false, @@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree name, tree, int, return NULL_TREE; } +/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in + struct attribute_spec.handler. */ +static tree
Re: [EXTERNAL] Re: Question about match.pd
On Thu, Nov 25, 2021 at 10:40 PM Navid Rahimi via Gcc wrote: > > > (A << B) eq/ne 0 > Yes that is correct. But for detecting such pattern you You have to detect B > and make sure B is boolean. GIMPLE transfers that Boolean to integer before > shifting. Note it's the C language specification that requires this. > After many hours of debugging, I think I managed to find out what is going on. > > +/* cmp : ==, != */ > +/* ((B0 << x) cmp 0) -> B0 cmp 0 */ > +(for cmp (eq ne) > + (simplify > + (cmp (lshift (convert@3 boolean_valued_p@0) @1) integer_zerop@2) > + (if (TREE_CODE (TREE_TYPE (@3)) == INTEGER_TYPE > + && (GIMPLE || !TREE_SIDE_EFFECTS (@1))) > +(cmp @0 @2 > > So when I am transforming something like above pattern to (cmp @0 @2) there > is a type mismatch between @0 and @2. > @0 is boolean and @2 is integer. That type mismatch does cause a lot of > headache when going through resimplification. Yeah, guess you need (cmp @0 { build_zero_cst (TREE_TYPE (@0); }) here. > > > Best wishes, > Navid. > > > From: Jeff Law > Sent: Wednesday, November 24, 2021 15:11 > To: Navid Rahimi; gcc@gcc.gnu.org > Subject: [EXTERNAL] Re: Question about match.pd > > > > On 11/24/2021 2:19 PM, Navid Rahimi via Gcc wrote: > > Hi GCC community, > > > > I have a question about pattern matching in match.pd. > > > > So I have a pattern like this [1]: > > #define CMP != > > bool f(bool c, int i) { return (c << i) CMP 0; } > > bool g(bool c, int i) { return c CMP 0;} > > > > It is verifiably correct to transfer f to g [2]. Although this pattern > > looks simple, but the problem rises because GIMPLE converts booleans to int > > before "<<" operation. > > So at the end you have boolean->integer->boolean conversion and the shift > > will happen on the integer in the middle. > > > > For example, for something like: > > > > bool g(bool c){return (c << 22);} > > > > The GIMPLE is: > > _Bool g (_Bool c) > > { > >int _1; > >int _2; > >_Bool _4; > > > > [local count: 1073741824]: > >_1 = (int) c_3(D); > >_2 = _1 << 22; > >_4 = _2 != 0; > >return _4; > > } > > > > I wrote a patch to fix this problem in match.pd: > > > > +(match boolean_valued_p > > + @0 > > + (if (TREE_CODE (type) == BOOLEAN_TYPE > > + && TYPE_PRECISION (type) == 1))) > > +(for op (tcc_comparison truth_and truth_andif truth_or truth_orif > > truth_xor) > > + (match boolean_valued_p > > + (op @0 @1))) > > +(match boolean_valued_p > > + (truth_not @0)) > > > > +/* cmp : ==, != */ > > +/* ((B0 << x) cmp 0) -> B0 cmp 0 */ > > +(for cmp (eq ne) > > + (simplify > > + (cmp (lshift (convert@3 boolean_valued_p@0) @1) integer_zerop@2) > > + (if (TREE_CODE (TREE_TYPE (@3)) == INTEGER_TYPE > > + && (GIMPLE || !TREE_SIDE_EFFECTS (@1))) > > +(cmp @0 @2 > > > > > > But the problem is I am not able to restrict to the cases I am interested > > in. There are many hits in other libraries I have tried compiling with > > trunk+patch. > > > > Any feedback? > > > > 1) > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D98956data=04%7C01%7Cnavidrahimi%40microsoft.com%7Caa8c9c8213a245c7ae9d08d9af9fc8ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733923073627850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=25KlLcsftTmN83rVawoKKaTPJdCdFlmtXMj%2BwsrKWbo%3Dreserved=0 > > 2) > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2FUUTJ_vdata=04%7C01%7Cnavidrahimi%40microsoft.com%7Caa8c9c8213a245c7ae9d08d9af9fc8ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733923073637846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=fwN9%2BB0VObPyuUS2fOtj14i%2BHJIiRhyyjZM4LOF4AP8%3Dreserved=0 > It would help to also see the cases you're triggering that you do not > want to trigger. > > Could we think of the optimization opportunity in a different way? > > > (A << B) eq/ne 0 -> A eq/ne (0U >> B) > > And I would expect the 0U >> B to get simplified to 0. > > Would looking at things that way help? > > jeff
Re: atomic_load
Am Sonntag, den 07.11.2021, 10:08 +0100 schrieb Martin Uecker: > It would be great if somebody could take a look at > PR96159. > > It seems we do not do atomic accesses correctly > when the alignment is insufficient for a lockfree > access, but I think we should fall back to a > library call in this case (as clang does). > > This is very unfortunate as it is an important > functionality to be able to do atomic accesses > on non-atomic types and it seems there is no way > to achieve this. > > Also documentation and various descriptions of > the atomic functions imply that this is expected > to work. > > But maybe I am missing something and the generated > code is indeed safe. > > Martin > Could this bug be confirmed please? This is a silent and dangerous incorrect code generation issue. If these functions are not meant to be used to exising data, then at least the documentation needs to be changed and include a big warning that this only happens to work corectly if the data has sufficient alignment for the specific architecture (which of course makes it impossible to use this in a portable way). I would then propose to add atomic_load_safe, so that it is possible to use such functionality safely on existing data structures which is an important use case. Martin
[r12-5536 Regression] FAIL: gfortran.dg/widechar_2.f90 -O0 (test for excess errors) on Linux/x86_64
On Linux/x86_64, 90cb088ece8d8cc1019d25629d1585e5b0234179 is the first bad commit commit 90cb088ece8d8cc1019d25629d1585e5b0234179 Author: konglin1 Date: Wed Nov 10 09:37:32 2021 +0800 i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811] caused FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation, -O0 FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation, -Og -g FAIL: gcc.c-torture/execute/builtins/memmove-chk.c compilation, -O0 FAIL: gcc.c-torture/execute/builtins/memmove-chk.c compilation, -Og -g FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c compilation, -O0 FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c compilation, -Og -g FAIL: gcc.dg/guality/vla-1.c -O0 (test for excess errors) FAIL: gcc.dg/guality/vla-1.c -O1 -DPREVENT_OPTIMIZATION (test for excess errors) FAIL: gcc.dg/guality/vla-1.c -O2 -DPREVENT_OPTIMIZATION (test for excess errors) FAIL: gcc.dg/guality/vla-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -DPREVENT_OPTIMIZATION (test for excess errors) FAIL: gcc.dg/guality/vla-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION (test for excess errors) FAIL: gcc.dg/guality/vla-1.c -O3 -g -DPREVENT_OPTIMIZATION (test for excess errors) FAIL: gcc.dg/guality/vla-1.c -Og -DPREVENT_OPTIMIZATION (test for excess errors) FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation, -O1 FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation, -O2 (internal compiler error) FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error) FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation, -O3 -g (internal compiler error) FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation, -Og -g FAIL: gcc.target/x86_64/abi/test_struct_returning.c compilation, -Os (internal compiler error) FAIL: gfortran.dg/fmt_cache_1.f -O0 (test for excess errors) FAIL: gfortran.dg/fmt_cache_1.f -O1 (test for excess errors) FAIL: gfortran.dg/fmt_cache_1.f -O2 (test for excess errors) FAIL: gfortran.dg/fmt_cache_1.f -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gfortran.dg/fmt_cache_1.f -O3 -g (test for excess errors) FAIL: gfortran.dg/fmt_cache_1.f -Os (test for excess errors) FAIL: gfortran.dg/g77/7388.f -O0 (test for excess errors) FAIL: gfortran.dg/iomsg_1.f90 -O0 (test for excess errors) FAIL: gfortran.dg/namelist_19.f90 -O0 (test for excess errors) FAIL: gfortran.dg/read_no_eor.f90 -O0 (test for excess errors) FAIL: gfortran.dg/round_4.f90 -O0 (test for excess errors) FAIL: gfortran.dg/widechar_2.f90 -O0 (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5536/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="builtins.exp=gcc.c-torture/execute/builtins/memcpy-chk.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="builtins.exp=gcc.c-torture/execute/builtins/memmove-chk.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="builtins.exp=gcc.c-torture/execute/builtins/mempcpy-chk.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="guality.exp=gcc.dg/guality/vla-1.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="abi-x86_64.exp=gcc.target/x86_64/abi/test_struct_returning.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/fmt_cache_1.f --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/g77/7388.f --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/iomsg_1.f90 --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/namelist_19.f90 --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/read_no_eor.f90 --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/round_4.f90 --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/widechar_2.f90 --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at
Re: [PATCH v3 0/8] __builtin_dynamic_object_size
On 11/26/21 10:58, Siddhesh Poyarekar wrote: sure it works) and saw no issues in any of those builds. I did some rudimentary analysis of the generated binaries using fortify-metrics[1] to confirm that there was a difference in coverage between the two fortification levels. Here is a summary of coverage in the above packages: F = number of fortified calls T = Total number of calls to fortifiable functions (fortified as well as unfortified) C = F * 100/ T Package F(2)T(2)F(3)T(3)C(2)C(3) bash428 12201005119635.08% 84.03% wpa_supplicant 163532322350340850.59% 68.96% systemtap 324 1990343 199416.28% 17.20% cmake 830 14181 958 14196 5.85% 6.75% The numbers are slightly lower than the previous patch series because in the interim I pushed an improvement to folding of the _chk builtins so that they can use ranges to simplify the calls to their regular variants. Also note that even _FORTIFY_SOURCE=2 coverage should be improved due to negative offset handling. [1] https://github.com/siddhesh/fortify-metrics
[PATCH v3 8/8] tree-object-size: Dynamic sizes for ADDR_EXPR
Allow returning dynamic expressions from ADDR_EXPR for __builtin_dynamic_object_size and also allow offsets to be dynamic. gcc/ChangeLog: * tree-object-size.c (size_valid_p): New function. (size_for_offset): Remove OFFSET constness assertion. (addr_object_size): Build dynamic expressions for object sizes and use size_valid_p to decide if it is valid for the given OBJECT_SIZE_TYPE. (compute_builtin_object_size): Allow dynamic offsets when computing size at O0. (call_object_size): Call size_valid_p. (plus_stmt_object_size): Allow non-constant offset and use size_valid_p to decide if it is valid for the given OBJECT_SIZE_TYPE. gcc/testsuite/ChangeLog: * gcc.dg/builtin-dynamic-object-size-0.c: Add new tests. * gcc.dg/builtin-object-size-1.c (test1) [__builtin_object_size]: Adjust expected output for dynamic object sizes. * gcc.dg/builtin-object-size-2.c (test1) [__builtin_object_size]: Likewise. * gcc.dg/builtin-object-size-3.c (test1) [__builtin_object_size]: Likewise. * gcc.dg/builtin-object-size-4.c (test1) [__builtin_object_size]: Likewise. Signed-off-by: Siddhesh Poyarekar --- .../gcc.dg/builtin-dynamic-object-size-0.c| 158 ++ gcc/testsuite/gcc.dg/builtin-object-size-1.c | 30 +++- gcc/testsuite/gcc.dg/builtin-object-size-2.c | 43 - gcc/testsuite/gcc.dg/builtin-object-size-3.c | 25 ++- gcc/testsuite/gcc.dg/builtin-object-size-4.c | 17 +- gcc/tree-object-size.c| 91 +- 6 files changed, 300 insertions(+), 64 deletions(-) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index 2db0e0d1aa2..4a1f4965ebd 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -219,6 +219,79 @@ test_deploop (size_t sz, size_t cond) return __builtin_dynamic_object_size (bin, 0); } +/* Address expressions. */ + +struct dynarray_struct +{ + long a; + char c[16]; + int b; +}; + +size_t +__attribute__ ((noinline)) +test_dynarray_struct (size_t sz, size_t off) +{ + struct dynarray_struct bin[sz]; + + return __builtin_dynamic_object_size ([off].c, 0); +} + +size_t +__attribute__ ((noinline)) +test_dynarray_struct_subobj (size_t sz, size_t off) +{ + struct dynarray_struct bin[sz]; + + return __builtin_dynamic_object_size ([off].c[4], 1); +} + +size_t +__attribute__ ((noinline)) +test_dynarray_struct_subobj2 (size_t sz, size_t off, size_t *objsz) +{ + struct dynarray_struct2 +{ + long a; + int b; + char c[sz]; +}; + + struct dynarray_struct2 bin; + + *objsz = sizeof (bin); + + return __builtin_dynamic_object_size ([off], 1); +} + +size_t +__attribute__ ((noinline)) +test_substring (size_t sz, size_t off) +{ + char str[sz]; + + return __builtin_dynamic_object_size ([off], 0); +} + +size_t +__attribute__ ((noinline)) +test_substring_ptrplus (size_t sz, size_t off) +{ + int str[sz]; + + return __builtin_dynamic_object_size (str + off, 0); +} + +size_t +__attribute__ ((noinline)) +test_substring_ptrplus2 (size_t sz, size_t off, size_t off2) +{ + int str[sz]; + int *ptr = [off]; + + return __builtin_dynamic_object_size (ptr + off2, 0); +} + size_t __attribute__ ((access (__read_write__, 1, 2))) __attribute__ ((noinline)) @@ -227,6 +300,40 @@ test_parmsz_simple (void *obj, size_t sz) return __builtin_dynamic_object_size (obj, 0); } +size_t +__attribute__ ((noinline)) +__attribute__ ((access (__read_write__, 1, 2))) +test_parmsz (void *obj, size_t sz, size_t off) +{ + return __builtin_dynamic_object_size (obj + off, 0); +} + +size_t +__attribute__ ((noinline)) +__attribute__ ((access (__read_write__, 1, 2))) +test_parmsz_scale (int *obj, size_t sz, size_t off) +{ + return __builtin_dynamic_object_size (obj + off, 0); +} + +size_t +__attribute__ ((noinline)) +__attribute__ ((access (__read_write__, 1, 2))) +test_loop (int *obj, size_t sz, size_t start, size_t end, int incr) +{ + int *ptr = obj + start; + + for (int i = start; i != end; i = i + incr) +{ + ptr = ptr + incr; + if (__builtin_dynamic_object_size (ptr, 0) == 0) + return 0; +} + + return __builtin_dynamic_object_size (ptr, 0); +} + + unsigned nfails = 0; #define FAIL() ({ \ @@ -287,6 +394,31 @@ main (int argc, char **argv) FAIL (); if (test_dynarray (__builtin_strlen (argv[0])) != __builtin_strlen (argv[0])) FAIL (); + if (test_dynarray_struct (42, 4) != + ((42 - 4) * sizeof (struct dynarray_struct) + - __builtin_offsetof (struct dynarray_struct, c))) +FAIL (); + if (test_dynarray_struct (42, 48) != 0) +FAIL (); + if (test_substring (128, 4) != 128 - 4) +FAIL (); + if (test_substring (128, 142) != 0) +FAIL (); + if (test_dynarray_struct_subobj (42, 4) != 16 - 4) +
[PATCH v3 7/8] tree-object-size: Handle GIMPLE_CALL
Handle non-constant expressions in GIMPLE_CALL arguments. Also handle alloca. gcc/ChangeLog: * tree-object-size.c (alloc_object_size): Make and return non-constant size expression. (call_object_size): Return expression or unknown based on whether dynamic object size is requested. gcc/testsuite/ChangeLog: * gcc.dg/builtin-dynamic-object-size-0.c: Add new tests. * gcc.dg/builtin-object-size-1.c (test1) [__builtin_object_size]: Alter expected result for dynamic object size. * gcc.dg/builtin-object-size-2.c (test1) [__builtin_object_size]: Likewise. * gcc.dg/builtin-object-size-3.c (test1) [__builtin_object_size]: Likewise. * gcc.dg/builtin-object-size-4.c (test1) [__builtin_object_size]: Likewise. Signed-off-by: Siddhesh Poyarekar --- .../gcc.dg/builtin-dynamic-object-size-0.c| 227 +- gcc/testsuite/gcc.dg/builtin-object-size-1.c | 7 + gcc/testsuite/gcc.dg/builtin-object-size-2.c | 14 ++ gcc/testsuite/gcc.dg/builtin-object-size-3.c | 7 + gcc/testsuite/gcc.dg/builtin-object-size-4.c | 14 ++ gcc/tree-object-size.c| 22 +- 6 files changed, 282 insertions(+), 9 deletions(-) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index ce0f4eb17f3..2db0e0d1aa2 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -4,12 +4,71 @@ typedef __SIZE_TYPE__ size_t; #define abort __builtin_abort +void * +__attribute__ ((alloc_size (1))) +__attribute__ ((__nothrow__ , __leaf__)) +__attribute__ ((noinline)) +alloc_func (size_t sz) +{ + return __builtin_malloc (sz); +} + +void * +__attribute__ ((alloc_size (1, 2))) +__attribute__ ((__nothrow__ , __leaf__)) +__attribute__ ((noinline)) +calloc_func (size_t cnt, size_t sz) +{ + return __builtin_calloc (cnt, sz); +} + +void * +__attribute__ ((noinline)) +unknown_allocator (size_t cnt, size_t sz) +{ + return __builtin_calloc (cnt, sz); +} + +size_t +__attribute__ ((noinline)) +test_unknown (size_t cnt, size_t sz) +{ + void *ret = unknown_allocator (cnt, sz); + return __builtin_dynamic_object_size (ret, 0); +} + +/* Malloc-like allocator. */ + +size_t +__attribute__ ((noinline)) +test_malloc (size_t sz) +{ + void *ret = alloc_func (sz); + return __builtin_dynamic_object_size (ret, 0); +} + +size_t +__attribute__ ((noinline)) +test_builtin_malloc (size_t sz) +{ + void *ret = __builtin_malloc (sz); + return __builtin_dynamic_object_size (ret, 0); +} + +size_t +__attribute__ ((noinline)) +test_builtin_malloc_cond (int cond) +{ + void *ret = __builtin_malloc (cond ? 32 : 64); + return __builtin_dynamic_object_size (ret, 0); +} + size_t __attribute__ ((noinline)) test_builtin_malloc_condphi (int cond) { void *ret; - + if (cond) ret = __builtin_malloc (32); else @@ -18,6 +77,79 @@ test_builtin_malloc_condphi (int cond) return __builtin_dynamic_object_size (ret, 0); } +size_t +__attribute__ ((noinline)) +test_builtin_malloc_condphi2 (int cond, size_t in) +{ + void *ret; + + if (cond) +ret = __builtin_malloc (in); + else +ret = __builtin_malloc (64); + + return __builtin_dynamic_object_size (ret, 0); +} + +size_t +__attribute__ ((noinline)) +test_builtin_malloc_condphi3 (int cond, size_t in, size_t in2) +{ + void *ret; + + if (cond) +ret = __builtin_malloc (in); + else +ret = __builtin_malloc (in2); + + return __builtin_dynamic_object_size (ret, 0); +} + +size_t +__attribute__ ((noinline)) +test_builtin_malloc_condphi4 (size_t sz, int cond) +{ + char *a = __builtin_malloc (sz); + char b[sz / 2]; + + return __builtin_dynamic_object_size (cond ? b : (void *) , 0); +} + +size_t +__attribute__ ((noinline)) +test_builtin_malloc_condphi5 (size_t sz, int cond, char *c) +{ + char *a = __builtin_malloc (sz); + + return __builtin_dynamic_object_size (cond ? c : (void *) , 0); +} + +/* Calloc-like allocator. */ + +size_t +__attribute__ ((noinline)) +test_calloc (size_t cnt, size_t sz) +{ + void *ret = calloc_func (cnt, sz); + return __builtin_dynamic_object_size (ret, 0); +} + +size_t +__attribute__ ((noinline)) +test_builtin_calloc (size_t cnt, size_t sz) +{ + void *ret = __builtin_calloc (cnt, sz); + return __builtin_dynamic_object_size (ret, 0); +} + +size_t +__attribute__ ((noinline)) +test_builtin_calloc_cond (int cond1, int cond2) +{ + void *ret = __builtin_calloc (cond1 ? 32 : 64, cond2 ? 1024 : 16); + return __builtin_dynamic_object_size (ret, 0); +} + size_t __attribute__ ((noinline)) test_builtin_calloc_condphi (size_t cnt, size_t sz, int cond) @@ -33,6 +165,47 @@ test_builtin_calloc_condphi (size_t cnt, size_t sz, int cond) return __builtin_dynamic_object_size (cond ? ch : (void *) , 0); } +/* Passthrough functions. */ + +size_t +__attribute__ ((noinline)) +test_passthrough (size_t sz,
[PATCH v3 5/8] tree-object-size: Support dynamic sizes in conditions
Handle GIMPLE_PHI and conditionals specially for dynamic objects, returning PHI/conditional expressions instead of just a MIN/MAX estimate. This makes the returned object size variable for loops and conditionals, so tests need to be adjusted to look for precise size in some cases. builtin-dynamic-object-size-5.c had to be modified to only look for success in maximum object size case and skip over the minimum object size tests because the result is no longer a compile time constant. I also added some simple tests to exercise conditionals with dynamic object sizes. gcc/ChangeLog: * builtins.c (fold_builtin_object_size): Adjust for dynamic size expressions. * tree-object-size.c: Include gimplify-me.h. (struct object_size_info): New member UNKNOWNS. (size_initval_p, object_sizes_get_raw): New functions. (object_sizes_get): Return suitable gimple variable for object size. (object_sizes_initialize): Reuse existing object size TREE_VEC during gimplification. (bundle_sizes): New function. (object_sizes_set): Use it and handle dynamic object size expressions. (object_sizes_set_temp): New function. (size_for_offset): Adjust for dynamic size expressions. (emit_phi_nodes, propagate_unknowns, gimplify_size_expressions): New functions. (compute_builtin_object_size): Call gimplify_size_expressions for OST_DYNAMIC. (dynamic_object_size): New function. (cond_expr_object_size): Use it. (phi_dynamic_object_size): New function. (collect_object_sizes_for): Call it for OST_DYNAMIC. Adjust to accommodate dynamic object sizes. gcc/testsuite/ChangeLog: * gcc.dg/builtin-dynamic-object-size-0.c: New tests. * gcc.dg/builtin-dynamic-object-size-10.c: Add comment. * gcc.dg/builtin-object-size-1.c [__builtin_object_size]: Expect exact size expressions for __builtin_dynamic_object_size. * gcc.dg/builtin-object-size-2.c [__builtin_object_size]: Likewise. * gcc.dg/builtin-object-size-3.c [__builtin_object_size]: Likewise. * gcc.dg/builtin-object-size-4.c [__builtin_object_size]: Likewise. * gcc.dg/builtin-object-size-5.c [__builtin_object_size]: Likewise. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. - Delay generating PHI nodes until gimplification so that it doesn't have to be undone if it was found to be unknown. - Adapt to retaining the multipass approach for static object sizes. gcc/builtins.c| 6 +- .../gcc.dg/builtin-dynamic-object-size-0.c| 72 +++ .../gcc.dg/builtin-dynamic-object-size-10.c | 2 + gcc/testsuite/gcc.dg/builtin-object-size-1.c | 119 - gcc/testsuite/gcc.dg/builtin-object-size-2.c | 92 gcc/testsuite/gcc.dg/builtin-object-size-3.c | 121 + gcc/testsuite/gcc.dg/builtin-object-size-4.c | 78 +++ gcc/testsuite/gcc.dg/builtin-object-size-5.c | 12 + gcc/tree-object-size.c| 494 +- 9 files changed, 966 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c diff --git a/gcc/builtins.c b/gcc/builtins.c index 573f7e9b9df..9770e13353d 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -10256,7 +10256,8 @@ fold_builtin_object_size (tree ptr, tree ost, enum built_in_function fcode) if (TREE_CODE (ptr) == ADDR_EXPR) { compute_builtin_object_size (ptr, object_size_type, ); - if (int_fits_type_p (bytes, size_type_node)) + if ((object_size_type & OST_DYNAMIC) + || int_fits_type_p (bytes, size_type_node)) return fold_convert (size_type_node, bytes); } else if (TREE_CODE (ptr) == SSA_NAME) @@ -10265,7 +10266,8 @@ fold_builtin_object_size (tree ptr, tree ost, enum built_in_function fcode) later. Maybe subsequent passes will help determining it. */ if (compute_builtin_object_size (ptr, object_size_type, ) - && int_fits_type_p (bytes, size_type_node)) + && ((object_size_type & OST_DYNAMIC) + || int_fits_type_p (bytes, size_type_node))) return fold_convert (size_type_node, bytes); } diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c new file mode 100644 index 000..ddedf6a49bd --- /dev/null +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +typedef __SIZE_TYPE__ size_t; +#define abort __builtin_abort + +size_t +__attribute__ ((noinline)) +test_builtin_malloc_condphi (int cond) +{ + void *ret; + + if (cond) +ret = __builtin_malloc (32); + else +ret = __builtin_malloc (64); + + return __builtin_dynamic_object_size (ret, 0); +} + +size_t +__attribute__ ((noinline))
[PATCH v3 6/8] tree-object-size: Handle function parameters
Handle hints provided by __attribute__ ((access (...))) to compute dynamic sizes for objects. gcc/ChangeLog: * tree-object-size.c: Include tree-dfa.h. (parm_object_size): New function. (collect_object_sizes_for): Call it. gcc/testsuite/ChangeLog: * gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple): New function. (main): Call it. Signed-off-by: Siddhesh Poyarekar --- .../gcc.dg/builtin-dynamic-object-size-0.c| 11 gcc/tree-object-size.c| 50 ++- 2 files changed, 60 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index ddedf6a49bd..ce0f4eb17f3 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -46,6 +46,14 @@ test_deploop (size_t sz, size_t cond) return __builtin_dynamic_object_size (bin, 0); } +size_t +__attribute__ ((access (__read_write__, 1, 2))) +__attribute__ ((noinline)) +test_parmsz_simple (void *obj, size_t sz) +{ + return __builtin_dynamic_object_size (obj, 0); +} + unsigned nfails = 0; #define FAIL() ({ \ @@ -64,6 +72,9 @@ main (int argc, char **argv) FAIL (); if (test_deploop (128, 129) != 32) FAIL (); + if (test_parmsz_simple (argv[0], __builtin_strlen (argv[0]) + 1) + != __builtin_strlen (argv[0]) + 1) +FAIL (); if (nfails > 0) __builtin_abort (); diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 5b4dcb619cd..48b1ec6e26a 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see #include "gimple-fold.h" #include "gimple-iterator.h" #include "tree-cfg.h" +#include "tree-dfa.h" #include "stringpool.h" #include "attribs.h" #include "builtins.h" @@ -1446,6 +1447,53 @@ cond_expr_object_size (struct object_size_info *osi, tree var, gimple *stmt) return reexamine; } +/* Find size of an object passed as a parameter to the function. */ + +static void +parm_object_size (struct object_size_info *osi, tree var) +{ + int object_size_type = osi->object_size_type; + tree parm = SSA_NAME_VAR (var); + + if (!(object_size_type & OST_DYNAMIC) || !POINTER_TYPE_P (TREE_TYPE (parm))) +expr_object_size (osi, var, parm); + + /* Look for access attribute. */ + rdwr_map rdwr_idx; + + tree fndecl = cfun->decl; + const attr_access *access = get_parm_access (rdwr_idx, parm, fndecl); + tree typesize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (parm))); + tree sz = NULL_TREE; + + if (access && access->sizarg != UINT_MAX) +{ + tree fnargs = DECL_ARGUMENTS (fndecl); + tree arg = NULL_TREE; + unsigned argpos = 0; + + /* Walk through the parameters to pick the size parameter and safely +scale it by the type size. */ + for (arg = fnargs; argpos != access->sizarg && arg; + arg = TREE_CHAIN (arg), ++argpos); + + if (arg != NULL_TREE && INTEGRAL_TYPE_P (TREE_TYPE (arg))) + { + sz = get_or_create_ssa_default_def (cfun, arg); + if (sz != NULL_TREE) + { + sz = fold_convert (sizetype, sz); + if (typesize) + sz = size_binop (MULT_EXPR, sz, typesize); + } + } +} + if (!sz) +sz = size_unknown (object_size_type); + + object_sizes_set (osi, SSA_NAME_VERSION (var), sz, sz); +} + /* Compute an object size expression for VAR, which is the result of a PHI node. */ @@ -1603,7 +1651,7 @@ collect_object_sizes_for (struct object_size_info *osi, tree var) case GIMPLE_NOP: if (SSA_NAME_VAR (var) && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL) - expr_object_size (osi, var, SSA_NAME_VAR (var)); + parm_object_size (osi, var); else /* Uninitialized SSA names point nowhere. */ unknown_object_size (osi, var); -- 2.31.1
[PATCH v3 4/8] __builtin_dynamic_object_size: Recognize builtin
Recognize the __builtin_dynamic_object_size builtin and add paths in the object size path to deal with it, but treat it like __builtin_object_size for now. Also add tests to provide the same testing coverage for the new builtin name. gcc/ChangeLog: * builtins.def (BUILT_IN_DYNAMIC_OBJECT_SIZE): New builtin. * tree-object-size.h: Move object size type bits enum from tree-object-size.c and add new value OST_DYNAMIC. * builtins.c (expand_builtin, fold_builtin_2): Handle it. (fold_builtin_object_size): Handle new builtin and adjust for change to compute_builtin_object_size. * tree-object-size.c: Include builtins.h. (compute_builtin_object_size): Adjust. (early_object_sizes_execute_one, dynamic_object_sizes_execute_one): New functions. (object_sizes_execute): Rename insert_min_max_p argument to early. Handle BUILT_IN_DYNAMIC_OBJECT_SIZE and call the new functions. doc/extend.texi (__builtin_dynamic_object_size): Document new builtin. gcc/testsuite/ChangeLog: * g++.dg/ext/builtin-dynamic-object-size1.C: New test. * g++.dg/ext/builtin-dynamic-object-size2.C: Likewise. * gcc.dg/builtin-dynamic-alloc-size.c: Likewise. * gcc.dg/builtin-dynamic-object-size-1.c: Likewise. * gcc.dg/builtin-dynamic-object-size-10.c: Likewise. * gcc.dg/builtin-dynamic-object-size-11.c: Likewise. * gcc.dg/builtin-dynamic-object-size-12.c: Likewise. * gcc.dg/builtin-dynamic-object-size-13.c: Likewise. * gcc.dg/builtin-dynamic-object-size-14.c: Likewise. * gcc.dg/builtin-dynamic-object-size-15.c: Likewise. * gcc.dg/builtin-dynamic-object-size-16.c: Likewise. * gcc.dg/builtin-dynamic-object-size-17.c: Likewise. * gcc.dg/builtin-dynamic-object-size-18.c: Likewise. * gcc.dg/builtin-dynamic-object-size-19.c: Likewise. * gcc.dg/builtin-dynamic-object-size-2.c: Likewise. * gcc.dg/builtin-dynamic-object-size-3.c: Likewise. * gcc.dg/builtin-dynamic-object-size-4.c: Likewise. * gcc.dg/builtin-dynamic-object-size-5.c: Likewise. * gcc.dg/builtin-dynamic-object-size-6.c: Likewise. * gcc.dg/builtin-dynamic-object-size-7.c: Likewise. * gcc.dg/builtin-dynamic-object-size-8.c: Likewise. * gcc.dg/builtin-dynamic-object-size-9.c: Likewise. * gcc.dg/builtin-object-size-16.c: Adjust to allow inclusion from builtin-dynamic-object-size-16.c. * gcc.dg/builtin-object-size-17.c: Likewise. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. gcc/builtins.c| 11 +- gcc/builtins.def | 1 + gcc/doc/extend.texi | 13 ++ .../g++.dg/ext/builtin-dynamic-object-size1.C | 5 + .../g++.dg/ext/builtin-dynamic-object-size2.C | 5 + .../gcc.dg/builtin-dynamic-alloc-size.c | 7 + .../gcc.dg/builtin-dynamic-object-size-1.c| 6 + .../gcc.dg/builtin-dynamic-object-size-10.c | 9 ++ .../gcc.dg/builtin-dynamic-object-size-11.c | 7 + .../gcc.dg/builtin-dynamic-object-size-12.c | 5 + .../gcc.dg/builtin-dynamic-object-size-13.c | 5 + .../gcc.dg/builtin-dynamic-object-size-14.c | 5 + .../gcc.dg/builtin-dynamic-object-size-15.c | 5 + .../gcc.dg/builtin-dynamic-object-size-16.c | 6 + .../gcc.dg/builtin-dynamic-object-size-17.c | 7 + .../gcc.dg/builtin-dynamic-object-size-18.c | 8 + .../gcc.dg/builtin-dynamic-object-size-19.c | 104 .../gcc.dg/builtin-dynamic-object-size-2.c| 6 + .../gcc.dg/builtin-dynamic-object-size-3.c| 6 + .../gcc.dg/builtin-dynamic-object-size-4.c| 6 + .../gcc.dg/builtin-dynamic-object-size-5.c| 7 + .../gcc.dg/builtin-dynamic-object-size-6.c| 5 + .../gcc.dg/builtin-dynamic-object-size-7.c| 5 + .../gcc.dg/builtin-dynamic-object-size-8.c| 5 + .../gcc.dg/builtin-dynamic-object-size-9.c| 5 + gcc/testsuite/gcc.dg/builtin-object-size-16.c | 2 + gcc/testsuite/gcc.dg/builtin-object-size-17.c | 2 + gcc/tree-object-size.c| 152 +- gcc/tree-object-size.h| 10 ++ 29 files changed, 378 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c create mode 100644
[PATCH v3 3/8] tree-object-size: Save sizes as trees and support negative offsets
Transform tree-object-size to operate on tree objects instead of host wide integers. This makes it easier to extend to dynamic expressions for object sizes. The compute_builtin_object_size interface also now returns a tree expression instead of HOST_WIDE_INT, so callers have been adjusted to account for that. The trees in object_sizes are each a TREE_VEC with the first element being the bytes from the pointer to the end of the object and the second, the size of the whole object. This allows analysis of negative offsets, which can now be allowed to the extent of the object bounds. Tests have been added to verify that it actually works. gcc/ChangeLog: * tree-object-size.h (compute_builtin_object_size): Return tree instead of HOST_WIDE_INT. * builtins.c (fold_builtin_object_size): Adjust. * gimple-fold.c (gimple_fold_builtin_strncat): Likewise. * ubsan.c (instrument_object_size): Likewise. * tree-object-size.c (object_sizes): Change type to vec. (initval): New function. (unknown): Use it. (size_unknown_p, size_initval, size_unknown): New functions. (object_sizes_unknown_p): Use it. (object_sizes_get): Return tree. (object_sizes_initialize): Rename from object_sizes_set_force and set VAL parameter type as tree. Add new parameter WHOLEVAL. (object_sizes_set): Set VAL parameter type as tree and adjust implementation. Add new parameter WHOLEVAL. (size_for_offset): New function. (decl_init_size): Adjust comment. (addr_object_size): Change PSIZE parameter to tree and adjust implementation. Add new parameter PWHOLESIZE. (alloc_object_size): Return tree. (compute_builtin_object_size): Return tree in PSIZE. (expr_object_size, call_object_size, unknown_object_size): Adjust for object_sizes_set change. (merge_object_sizes): Drop OFFSET parameter and adjust implementation for tree change. (plus_stmt_object_size): Call collect_object_sizes_for directly instead of merge_object_size and call size_for_offset to get net size. (cond_expr_object_size, collect_object_sizes_for, object_sizes_execute): Adjust for change of type from HOST_WIDE_INT to tree. (check_for_plus_in_loops_1): Likewise and skip non-positive offsets. gcc/testsuite/ChangeLog: * gcc.dg/builtin-object-size-1.c (test9): New test. (main): Call it. * gcc.dg/builtin-object-size-2.c (test8): New test. (main): Call it. * gcc.dg/builtin-object-size-3.c (test9): New test. (main): Call it. * gcc.dg/builtin-object-size-4.c (test8): New test. (main): Call it. * gcc.dg/builtin-object-size-5.c (test5, test6, test7): New tests. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. - Added support for negative offsets. gcc/builtins.c | 10 +- gcc/gimple-fold.c| 11 +- gcc/testsuite/gcc.dg/builtin-object-size-1.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-2.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-3.c | 31 ++ gcc/testsuite/gcc.dg/builtin-object-size-4.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-5.c | 25 ++ gcc/tree-object-size.c | 388 --- gcc/tree-object-size.h | 2 +- gcc/ubsan.c | 5 +- 10 files changed, 403 insertions(+), 159 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 384864bfb3a..50e66692775 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -10226,7 +10226,7 @@ maybe_emit_sprintf_chk_warning (tree exp, enum built_in_function fcode) static tree fold_builtin_object_size (tree ptr, tree ost) { - unsigned HOST_WIDE_INT bytes; + tree bytes; int object_size_type; if (!validate_arg (ptr, POINTER_TYPE) @@ -10251,8 +10251,8 @@ fold_builtin_object_size (tree ptr, tree ost) if (TREE_CODE (ptr) == ADDR_EXPR) { compute_builtin_object_size (ptr, object_size_type, ); - if (wi::fits_to_tree_p (bytes, size_type_node)) - return build_int_cstu (size_type_node, bytes); + if (int_fits_type_p (bytes, size_type_node)) + return fold_convert (size_type_node, bytes); } else if (TREE_CODE (ptr) == SSA_NAME) { @@ -10260,8 +10260,8 @@ fold_builtin_object_size (tree ptr, tree ost) later. Maybe subsequent passes will help determining it. */ if (compute_builtin_object_size (ptr, object_size_type, ) - && wi::fits_to_tree_p (bytes, size_type_node)) - return build_int_cstu (size_type_node, bytes); + && int_fits_type_p (bytes, size_type_node)) + return fold_convert (size_type_node, bytes); } return NULL_TREE; diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index
[PATCH v3 2/8] tree-object-size: Abstract object_sizes array
Put all accesses to object_sizes behind functions so that we can add dynamic capability more easily. gcc/ChangeLog: * tree-object-size.c (object_sizes_grow, object_sizes_release, object_sizes_unknown_p, object_sizes_get, object_size_set_force, object_sizes_set): New functions. (addr_object_size, compute_builtin_object_size, expr_object_size, call_object_size, unknown_object_size, merge_object_sizes, plus_stmt_object_size, cond_expr_object_size, collect_object_sizes_for, check_for_plus_in_loops_1, init_object_sizes, fini_object_sizes): Adjust. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. gcc/tree-object-size.c | 177 +++-- 1 file changed, 98 insertions(+), 79 deletions(-) diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 5e93bb74f92..3780437ff91 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -88,6 +88,71 @@ unknown (int object_size_type) return ((unsigned HOST_WIDE_INT) -((object_size_type >> 1) ^ 1)); } +/* Grow object_sizes[OBJECT_SIZE_TYPE] to num_ssa_names. */ + +static inline void +object_sizes_grow (int object_size_type) +{ + if (num_ssa_names > object_sizes[object_size_type].length ()) +object_sizes[object_size_type].safe_grow (num_ssa_names, true); +} + +/* Release object_sizes[OBJECT_SIZE_TYPE]. */ + +static inline void +object_sizes_release (int object_size_type) +{ + object_sizes[object_size_type].release (); +} + +/* Return true if object_sizes[OBJECT_SIZE_TYPE][VARNO] is unknown. */ + +static inline bool +object_sizes_unknown_p (int object_size_type, unsigned varno) +{ + return (object_sizes[object_size_type][varno] + == unknown (object_size_type)); +} + +/* Return size for VARNO corresponding to OSI. */ + +static inline unsigned HOST_WIDE_INT +object_sizes_get (struct object_size_info *osi, unsigned varno) +{ + return object_sizes[osi->object_size_type][varno]; +} + +/* Set size for VARNO corresponding to OSI to VAL. */ + +static inline bool +object_sizes_set_force (struct object_size_info *osi, unsigned varno, + unsigned HOST_WIDE_INT val) +{ + object_sizes[osi->object_size_type][varno] = val; + return true; +} + +/* Set size for VARNO corresponding to OSI to VAL if it is the new minimum or + maximum. */ + +static inline bool +object_sizes_set (struct object_size_info *osi, unsigned varno, + unsigned HOST_WIDE_INT val) +{ + int object_size_type = osi->object_size_type; + if ((object_size_type & OST_MINIMUM) == 0) +{ + if (object_sizes[object_size_type][varno] < val) + return object_sizes_set_force (osi, varno, val); +} + else +{ + if (object_sizes[object_size_type][varno] > val) + return object_sizes_set_force (osi, varno, val); +} + return false; +} + /* Initialize OFFSET_LIMIT variable. */ static void init_offset_limit (void) @@ -247,7 +312,7 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, collect_object_sizes_for (osi, var); if (bitmap_bit_p (computed[object_size_type], SSA_NAME_VERSION (var))) - sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)]; + sz = object_sizes_get (osi, SSA_NAME_VERSION (var)); else sz = unknown (object_size_type); } @@ -582,14 +647,14 @@ compute_builtin_object_size (tree ptr, int object_size_type, return false; } + struct object_size_info osi; + osi.object_size_type = object_size_type; if (!bitmap_bit_p (computed[object_size_type], SSA_NAME_VERSION (ptr))) { - struct object_size_info osi; bitmap_iterator bi; unsigned int i; - if (num_ssa_names > object_sizes[object_size_type].length ()) - object_sizes[object_size_type].safe_grow (num_ssa_names, true); + object_sizes_grow (object_size_type); if (dump_file) { fprintf (dump_file, "Computing %s %sobject size for ", @@ -601,7 +666,6 @@ compute_builtin_object_size (tree ptr, int object_size_type, osi.visited = BITMAP_ALLOC (NULL); osi.reexamine = BITMAP_ALLOC (NULL); - osi.object_size_type = object_size_type; osi.depths = NULL; osi.stack = NULL; osi.tos = NULL; @@ -678,8 +742,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, if (dump_file) { EXECUTE_IF_SET_IN_BITMAP (osi.visited, 0, i, bi) - if (object_sizes[object_size_type][i] - != unknown (object_size_type)) + if (!object_sizes_unknown_p (object_size_type, i)) { print_generic_expr (dump_file, ssa_name (i), dump_flags); @@ -689,7 +752,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, ((object_size_type & OST_MINIMUM) ?
[PATCH v3 1/8] tree-object-size: Replace magic numbers with enums
A simple cleanup to allow inserting dynamic size code more easily. gcc/ChangeLog: * tree-object-size.c: New enum. (object_sizes, computed, addr_object_size, compute_builtin_object_size, expr_object_size, call_object_size, merge_object_sizes, plus_stmt_object_size, collect_object_sizes_for, init_object_sizes, fini_object_sizes, object_sizes_execute): Replace magic numbers with enums. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. gcc/tree-object-size.c | 59 -- 1 file changed, 34 insertions(+), 25 deletions(-) diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 4334e05ef70..5e93bb74f92 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -45,6 +45,13 @@ struct object_size_info unsigned int *stack, *tos; }; +enum +{ + OST_SUBOBJECT = 1, + OST_MINIMUM = 2, + OST_END = 4, +}; + static tree compute_object_offset (const_tree, const_tree); static bool addr_object_size (struct object_size_info *, const_tree, int, unsigned HOST_WIDE_INT *); @@ -67,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct object_size_info *, tree, the subobject (innermost array or field with address taken). object_sizes[2] is lower bound for number of bytes till the end of the object and object_sizes[3] lower bound for subobject. */ -static vec object_sizes[4]; +static vec object_sizes[OST_END]; /* Bitmaps what object sizes have been computed already. */ -static bitmap computed[4]; +static bitmap computed[OST_END]; /* Maximum value of offset we consider to be addition. */ static unsigned HOST_WIDE_INT offset_limit; @@ -227,11 +234,11 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, { unsigned HOST_WIDE_INT sz; - if (!osi || (object_size_type & 1) != 0 + if (!osi || (object_size_type & OST_SUBOBJECT) != 0 || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME) { compute_builtin_object_size (TREE_OPERAND (pt_var, 0), - object_size_type & ~1, ); + object_size_type & ~OST_SUBOBJECT, ); } else { @@ -266,7 +273,7 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, } else if (DECL_P (pt_var)) { - pt_var_size = decl_init_size (pt_var, object_size_type & 2); + pt_var_size = decl_init_size (pt_var, object_size_type & OST_MINIMUM); if (!pt_var_size) return false; } @@ -287,7 +294,7 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, { tree var; - if (object_size_type & 1) + if (object_size_type & OST_SUBOBJECT) { var = TREE_OPERAND (ptr, 0); @@ -528,7 +535,7 @@ bool compute_builtin_object_size (tree ptr, int object_size_type, unsigned HOST_WIDE_INT *psize) { - gcc_assert (object_size_type >= 0 && object_size_type <= 3); + gcc_assert (object_size_type >= 0 && object_size_type < OST_END); /* Set to unknown and overwrite just before returning if the size could be determined. */ @@ -546,7 +553,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, if (computed[object_size_type] == NULL) { - if (optimize || object_size_type & 1) + if (optimize || object_size_type & OST_SUBOBJECT) return false; /* When not optimizing, rather than failing, make a small effort @@ -586,8 +593,8 @@ compute_builtin_object_size (tree ptr, int object_size_type, if (dump_file) { fprintf (dump_file, "Computing %s %sobject size for ", - (object_size_type & 2) ? "minimum" : "maximum", - (object_size_type & 1) ? "sub" : ""); + (object_size_type & OST_MINIMUM) ? "minimum" : "maximum", + (object_size_type & OST_SUBOBJECT) ? "sub" : ""); print_generic_expr (dump_file, ptr, dump_flags); fprintf (dump_file, ":\n"); } @@ -620,7 +627,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, terminate, it could take a long time. If a pointer is increasing this way, we need to assume 0 object size. E.g. p = [0]; while (cond) p = p + 4; */ - if (object_size_type & 2) + if (object_size_type & OST_MINIMUM) { osi.depths = XCNEWVEC (unsigned int, num_ssa_names); osi.stack = XNEWVEC (unsigned int, num_ssa_names); @@ -679,8 +686,9 @@ compute_builtin_object_size (tree ptr, int object_size_type, fprintf (dump_file, ": %s %sobject size " HOST_WIDE_INT_PRINT_UNSIGNED "\n", -(object_size_type & 2) ? "minimum" : "maximum", -(object_size_type & 1) ? "sub" : "", +
[PATCH v3 0/8] __builtin_dynamic_object_size
This patchset implements the __builtin_dynamic_object_size builtin for gcc. The primary motivation to have this builtin in gcc is to enable _FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification in use cases where the potential performance tradeoff is acceptable. Semantics: -- __builtin_dynamic_object_size has the same signature as __builtin_object_size; it accepts a pointer and type ranging from 0 to 3 and it returns an object size estimate for the pointer based on an analysis of which objects the pointer could point to. The actual properties of the object size estimate are different: - In the best case __builtin_dynamic_object_size evaluates to an expression that represents a precise size of the object being pointed to. - In case a precise object size expression cannot be evaluated, __builtin_dynamic_object_size attempts to evaluate an estimate size expression based on the object size type. - In what situations the builtin returns an estimate vs a precise expression is an implementation detail and may change in future. Users must always assume, as in the case of __builtin_object_size, that the returned value is the maximum or minimum based on the object size type they have provided. - In the worst case of failure, __builtin_dynamic_object_size returns a constant (size_t)-1 or (size_t)0. Implementation: --- - The __builtin_dynamic_object_size support is implemented in tree-object-size. In most cases the first pass (early_objsz) the builtin is treated like __builtin_object_size to preserve subobject bounds. - Each element of the object_sizes vector is now a TREE_VEC of size 2 holding bytes to the end of the object and the full size of the object. This allows proper handling of negative offsets, allowing them to the extent of the whole object bounds. This improves __builtin_object_size usage too with negative offsets, consistently returning valid results for pointer decrementing loops too. - The patchset begins with structural modification of the tree-object-size pass, followed by enhancement to return size expressions. I have split the implementation into one feature per patch (calls, function parameters, PHI, etc.) to hopefully ease review. Performance: Expressions generated by this pass in theory could be arbitrarily complex. I have not made an attempt to limit nesting of objects since it seemed too early to do that. In practice based on the few applications I built, most of the complexity of the expressions got folded away. Even so, the performance overhead is likely to be non-zero. If we find performance degradation to be significant, we could later add nesting limits to bail out if a size expression gets too complex. I have implemented simplification of __*_chk to their normal variants if we can determine at compile time that it is safe. This should limit the performance overhead of the expressions in valid cases. Build time performance doesn't seem to be affected much based on an unscientific check to time `make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`. It only increases by about a couple of seconds when the dynamic tests are added and remains more or less in the same ballpark otherwise. Testing: I have added tests for dynamic object sizes as well as wrappers for all __builtin_object_size tests to provide wide coverage. I have also done a full bootstrap build and test run on x86_64. I have also built bash, cmake, wpa_supplicant and systemtap with _FORTIFY_SOURCE=2 and _FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw no issues in any of those builds. I did some rudimentary analysis of the generated binaries using fortify-metrics[1] to confirm that there was a difference in coverage between the two fortification levels. Here is a summary of coverage in the above packages: F = number of fortified calls T = Total number of calls to fortifiable functions (fortified as well as unfortified) C = F * 100/ T Package F(2)T(2)F(3)T(3)C(2)C(3) bash428 12201005119635.08% 84.03% wpa_supplicant 163532322350340850.59% 68.96% systemtap 324 1990343 199416.28% 17.20% cmake 830 14181 958 14196 5.85% 6.75% The numbers are slightly lower than the previous patch series because in the interim I pushed an improvement to folding of the _chk builtins so that they can use ranges to simplify the calls to their regular variants. Also note that even _FORTIFY_SOURCE=2 coverage should be improved due to negative offset handling. Additional testing plans (i.e. I've already started to do some of this): - Build packages to compare values returned by __builtin_object_size with the older pass and this new one. Also compare with __builtin_dynamic_object_size. - Expand the list of packages to get more coverage metrics. - Explore performance impact on
[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271 --- Comment #6 from Jim Wilson --- See also bug 103302 which can also be fixed by adding a movti pattern.
[Bug target/103302] wrong code with -fharden-compares
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302 --- Comment #4 from Jim Wilson --- See also bug 103271 which can also be fixed by adding a movti pattern.
[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #5 from Jim Wilson --- SiFive doesn't support -mno-strict-align so I've never tested it. I doubt that it works correctly, i.e. I doubt that it optimizes as intended. I've mentioned this to other RVI members, but there hasn't been anyone other than SiFive actively working on upstream gcc so I don't think anyone ever looked at it. It shouldn't give an ICE though. Looking at this, it appears to be another "if only we had a movti pattern" issue. In expand_DEFERRED_INIT in internal-fn.c, in the reg_lhs == TRUE case, there is a test && have_insn_for (SET, var_mode)) which fails because var_mode is TImode and we don't have a movti pattern. The code calls build_zero_cst which returns a constructor with an array type. We then call expand_assignment which gets confused as it doesn't know the size of the array it is copying. However, if we had a movti pattern, then the code computes the size of the array, and creates a VIEW_CONVERT_EXPR to document the array size before calling expand_assignment. So it looks like it would work if we had a movti pattern. I verified that adding a dummy movti pattern makes the ICE go away.
[Bug target/103302] wrong code with -fharden-compares
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302 --- Comment #3 from Jim Wilson --- Maybe the register allocator should remove clobbers of pseudos, instead of turning them into clobbers of hard register pairs. That would eliminate the ambiguity after register allocation. It is also true that we don't needs hard reg clobbers. The clobbers are only there for tracking pseudo reg subregs.
[r12-5531 Regression] FAIL: gcc.dg/ipa/inline-9.c scan-ipa-dump inline "Inlined 1 calls" on Linux/x86_64
On Linux/x86_64, 1b0acc4b800b589a39d637d7312da5cf969a5765 is the first bad commit commit 1b0acc4b800b589a39d637d7312da5cf969a5765 Author: Jan Hubicka Date: Thu Nov 25 23:58:48 2021 +0100 Remove forgotten early return in ipa_value_range_from_jfunc caused FAIL: gcc.dg/ipa/inline-9.c scan-ipa-dump inline "Inlined 1 calls" with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5531/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[Bug target/103433] ICE in convert_move, at expr.c:219
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103433 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Keywords||ice-on-valid-code Host|x86_64-linux| Last reconfirmed||2021-11-26 Ever confirmed|0 |1 CC||pinskia at gcc dot gnu.org Component|c |target Target|aarch64-none-elf|aarch64*-*-* --- Comment #1 from Andrew Pinski --- Confirmed on the trunk.
[Bug c/103433] New: ICE in convert_move, at expr.c:219
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103433 Bug ID: 103433 Summary: ICE in convert_move, at expr.c:219 Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ilyply2006 at hotmail dot com Target Milestone: --- $ cat test.c #include "arm_sve.h" __attribute__((noinline)) void test_ldst_1 (svfloat32_t op0, svfloat32x2_t *op1) { *op1 = *(svfloat32x2_t*) } $ ./aarch64-none-elf-gcc -v -save-temps -march=armv8.2-a+sve test.c -O3 -S Using built-in specs. COLLECT_GCC=./aarch64-none-elf-gcc Target: aarch64-none-elf Configured with: /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/configure SHELL=/bin/sh --with-mpc=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu --with-mpfr=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu --with-gmp=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu --with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto --enable-shared --without-included-gettext --enable-nls --with-system-zlib --disable-sjlj-exceptions --enable-gnu-unique-object --enable-linker-build-id --disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu --enable-libstdcxx-debug --enable-long-long --with-cloog=no --with-ppl=no --with-isl=no --enable-multilib --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --with-arch=armv8-a --enable-threads=no --disable-multiarch --with-newlib --with-build-sysroot= --with-sysroot=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/aarch64-none-elf/libc --enable-checking=release --disable-bootstrap --enable-languages=c,c++,lto --prefix=/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=aarch64-none-elf Thread model: single Supported LTO compression algorithms: zlib gcc version 10.2.1 20201103 (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-march=armv8.2-a+sve' '-O3' '-S' '-mlittle-endian' '-mabi=lp64' /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/libexec/gcc/aarch64-none-elf/10.2.1/cc1 -E -quiet -v test.c -march=armv8.2-a+sve -mlittle-endian -mabi=lp64 -O3 -fpch-preprocess -o test.i ignoring nonexistent directory "/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/aarch64-none-elf/libc/usr/local/include" ignoring nonexistent directory "/work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/aarch64-none-elf/libc/usr/include" #include "..." search starts here: #include <...> search starts here: /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-none-elf/10.2.1/include /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-none-elf/10.2.1/include-fixed /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-none-elf/10.2.1/../../../../aarch64-none-elf/include End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-march=armv8.2-a+sve' '-O3' '-S' '-mlittle-endian' '-mabi=lp64' /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/builds/destdir/x86_64-pc-linux-gnu/libexec/gcc/aarch64-none-elf/10.2.1/cc1 -fpreprocessed test.i -quiet -dumpbase test.c -march=armv8.2-a+sve -mlittle-endian -mabi=lp64 -auxbase test -O3 -version -o test.s GNU C17 (GCC) version 10.2.1 20201103 (aarch64-none-elf) compiled by GNU C version 7.5.0, GMP version 4.3.2, MPFR version 3.1.6, MPC version 1.0.3, isl version none GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C17 (GCC) version 10.2.1 20201103 (aarch64-none-elf) compiled by GNU C version 7.5.0, GMP version 4.3.2, MPFR version 3.1.6, MPC version 1.0.3, isl version none GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 2cefa28229609aee36b21907b2deb066 during RTL pass: expand test.c: In function ‘test_ldst_1’: test.c:5:10: internal compiler error: in convert_move, at expr.c:219 5 | *op1 = *(svfloat32x2_t*) | ~^~~ 0x8606f3 convert_move(rtx_def*, rtx_def*, int) /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/gcc/expr.c:219 0x86773d store_expr(tree_node*, rtx_def*, int, bool, bool) /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/gcc/expr.c:5832 0x867c55 expand_assignment(tree_node*, tree_node*, bool) /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/gcc/expr.c:5516 0x75aed8 expand_gimple_stmt_1 /work/home/xjin/gcc/arm-gnu-toolchain/abe_build/snapshots/gcc/gcc/cfgexpand.c:3753 0x75aed8 expand_gimple_stmt
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #25 from Kewen Lin --- Status update: > > The fusion related flags have been considered in the posted patch: > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html. > It's still being ping-ed for review since it's posted on Sep. 01. > One RFC/Patch > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html is also > posted to see if we can avoid to change implicit option behavior for > Power8/9. The patch v3 (https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579658.html) was approved with some additional required adjustments. But the cases were written/tested on top of the above fusion related patch, so I hold to commit it.
[Bug target/102347] "fatal error: target specific builtin not available" with MMA and LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347 Kewen Lin changed: What|Removed |Added CC||segher at gcc dot gnu.org, ||wschmidt at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2021-11-26 --- Comment #11 from Kewen Lin --- Status update: one proposed fix was posted to gcc-patches@ on Sep 28 (https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580357.html), there were some discussion following that, we agreed the proposed fix is safe eventually. There are no further new versions for it, so keep the original one being ping-ed for review.
[Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 Resolution|--- |FIXED --- Comment #5 from Andrew Pinski --- .
[Bug c++/98360] sizeof in template difference between g++/icc and clang++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98360 --- Comment #3 from Andrew Pinski --- GCC, ICC and MSVC all agree that this is valid code and all produce 4. clang is the only one which rejects it. Here is an even more reduced testcase: template struct uintset { T values[1]; struct traits { }; struct hash : traits { int foo () { return sizeof (uintset::values); } }; hash h; }; uintset s; int x = s.h.foo (); If you remove the base class or change it not to dependent type, the code is accepted. The defect reports in this area: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#613 http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#198 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2253.html is the paper which resolves 613. I suspect GCC is correct if I go by this paper.
[Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811 --- Comment #4 from Hongtao.liu --- Fixed in GCC12.
[Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811 --- Comment #3 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:90cb088ece8d8cc1019d25629d1585e5b0234179 commit r12-5536-g90cb088ece8d8cc1019d25629d1585e5b0234179 Author: konglin1 Date: Wed Nov 10 09:37:32 2021 +0800 i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811] Add define_insn extendhfsf2 and truncsfhf2 for target_f16c. gcc/ChangeLog: PR target/102811 * config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit data in XMM register for TARGET_SSE2. * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for TARGET_F16C. (extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only. (*extendhf2): Rename from extendhf2. (truncsfhf2): Likewise. (truncdfhf2): Likewise. (*trunc2): Likewise. gcc/testsuite/ChangeLog: PR target/102811 * gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw. * gcc.target/i386/pr90773-23.c: Ditto. * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
[Bug middle-end/103419] FAIL: gcc.target/i386/pr102566-10b.c with -mx32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103419 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |12.0 --- Comment #6 from Andrew Pinski --- .
[Bug middle-end/103419] FAIL: gcc.target/i386/pr102566-10b.c with -mx32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103419 --- Comment #5 from Hongtao.liu --- Fixed in GCC12.
[Bug middle-end/103419] FAIL: gcc.target/i386/pr102566-10b.c with -mx32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103419 --- Comment #4 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:379be00f45f65e0e8de72a50553dd9d2bab6cc08 commit r12-5535-g379be00f45f65e0e8de72a50553dd9d2bab6cc08 Author: liuhongt Date: Thu Nov 25 13:51:57 2021 +0800 Fix typo in r12-5486. gcc/ChangeLog: PR middle-end/103419 * match.pd: Fix typo, use the type of second parameter, not first one.
[Bug testsuite/103335] [12 Regression] new test case gcc.dg/tree-ssa/modref-dse-4.c fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103335 Bug 103335 depends on bug 103282, which changed state. Bug 103282 Summary: New test case gcc.dg/tree-ssa/modref-dse-5.c in r12-5292 fails https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103282 What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED
[Bug testsuite/103282] New test case gcc.dg/tree-ssa/modref-dse-5.c in r12-5292 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103282 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #11 from Jan Hubicka --- Fixed.
[Bug ipa/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432 Andrew Pinski changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org Status|NEW |ASSIGNED
[Bug ipa/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432 Andrew Pinski changed: What|Removed |Added Keywords||needs-reduction, wrong-code Last reconfirmed|2021-11-26 00:00:00 | Target Milestone|--- |12.0 Status|ASSIGNED|NEW Assignee|hubicka at gcc dot gnu.org |unassigned at gcc dot gnu.org Known to work||11.2.0 Component|tree-optimization |ipa --- Comment #2 from Andrew Pinski --- Confirmed, I have not reduced it but here is what is happening. outD.25694 = {}; ... MEM[(struct DCTToD.21174 *) clique 3 base 1].data_D.21196 = ... _ZN12_GLOBAL__N_121GenericTransposeBlockILm1ELm4ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0D.25466 (, ); ... _ZN12_GLOBAL__N_113IDCT1DWrapperILm4ELm1ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0D.25467 (, ); ... _3 = outD.25694[2]; FRE thinks _ZN12_GLOBAL__N_113IDCT1DWrapperILm4ELm1ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0 does not touch out even though D.25700 is passed to it
[Bug tree-optimization/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432 Jan Hubicka changed: What|Removed |Added Ever confirmed|0 |1 Target Milestone|12.0|--- Keywords|wrong-code | Known to work|11.2.0 | Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2021-11-26 Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org CC||hubicka at gcc dot gnu.org Component|ipa |tree-optimization --- Comment #1 from Jan Hubicka --- It fails with ./xgcc -B ./ -O2 d.ii -fdbg-cnt=ipa_mod_ref_pta:189 -fdump-tree-all-details -fdump-ipa-all-details and works ./xgcc -B ./ -O2 d.ii -fdbg-cnt=ipa_mod_ref_pta:188 -fdump-tree-all-details -fdump-ipa-all-details The difference in optimized dump is: int main () { struct DCTFrom D.11418; @@ -2805,12 +2810,7 @@ float x[4]; struct DCTTo D.11356; struct DCTFrom D.11355; - float _3; - float _4; - double _6; struct FILE * stderr.3_8; - double _9; - struct FILE * stderr.4_11; float _12; float _13; double _15; @@ -2996,30 +2996,10 @@ {anonymous}::IDCT1DWrapper.constprop<4, 1, {anonymous}::DCTFrom, {anonymous}::DCTTo> (, ); D.11400 ={v} {CLOBBER}; D.11356 ={v} {CLOBBER}; - _3 = out[2]; - _4 = _3 - 1.0e+0; - actual_accuracy_5 = ABS_EXPR <_4>; - if (actual_accuracy_5 > 9.99974752427078783512115478515625e-7) -goto ; [0.04%] - else -goto ; [99.96%] - - [local count: 429325]: - _6 = (double) actual_accuracy_5; stderr.3_8 = stderr; - fprintf (stderr.3_8, "ERROR: Too low accuracy: exp=%f act=%f\n", 9.99974752427078783512115478515625e-7, _6); + fprintf (stderr.3_8, "ERROR: Too low accuracy: exp=%f act=%f\n", 9.99974752427078783512115478515625e-7, 1.0e+0); exit (1); - [local count: 1072883004]: - _9 = (double) actual_accuracy_5; - stderr.4_11 = stderr; - fprintf (stderr.4_11, "OK: Good accuracy: exp=%f act=%f\n", 9.99974752427078783512115478515625e-7, _9); - x ={v} {CLOBBER}; - out ={v} {CLOBBER}; - coeffs ={v} {CLOBBER}; - scratch_space ={v} {CLOBBER}; - return 0; - } And I suppose we are not expected to optimize out the "Good accuracy" message :) So it looks out is modified by {anonymous}::IDCT1DWrapper.constprop<4, 1, {anonymous}::DCTFrom, {anonymous}::DCTTo> (, ); but for some reason ipa propagation gets no_indirect_clobber for param1. This seems wrong since to->data is written to, so it is an indirect clobber. I may be able to look more into this only tomorrow - it is bit late.
[Bug target/103302] wrong code with -fharden-compares
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson --- It is the second reversed comparison that is wrong. This is the u32_0 <= (...) on the first line of foo0. In the assembly file, this ends up as mv a0,a6 mv a1,a7 xor a6,a0,a6 xor a7,a1,a7 or a6,a6,a7 seqza6,a6 and note that it is comparing a value against itself when it should be comparing two different values. The harden compare pass is generating RTL insn 156 152 155 6 (set (reg:TI 201) (asm_operands:TI ("") ("=g") 0 [ (reg:TI 77 [ _8 ]) ] [ (asm_input:TI ("0")) ] [])) -1 (nil)) (insn 155 156 153 6 (clobber (reg:TI 77 [ _8 ])) -1 (nil)) (insn 153 155 154 6 (set (subreg:DI (reg:TI 77 [ _8 ]) 0) (subreg:DI (reg:TI 201) 0)) -1 (nil)) (insn 154 153 160 6 (set (subreg:DI (reg:TI 77 [ _8 ]) 8) (subreg:DI (reg:TI 201) 8)) -1 (nil)) Then the asmcons pass is changing this to (insn 851 152 849 5 (clobber (reg:TI 201)) -1 (nil)) (insn 849 851 850 5 (set (subreg:DI (reg:TI 201) 0) (subreg:DI (reg:TI 77 [ _8 ]) 0)) -1 (nil)) (insn 850 849 156 5 (set (subreg:DI (reg:TI 201) 8) (subreg:DI (reg:TI 77 [ _8 ]) 8)) -1 (nil)) (insn 156 850 155 5 (set (reg:TI 201) (asm_operands:TI ("") ("=g") 0 [ (reg:TI 201) ] [ (asm_input:TI ("0")) ] [])) -1 (expr_list:REG_DEAD (reg:TI 77 [ _8 ]) (nil))) (insn 155 156 153 5 (clobber (reg:TI 77 [ _8 ])) -1 (nil)) (insn 153 155 154 5 (set (subreg:DI (reg:TI 77 [ _8 ]) 0) (subreg:DI (reg:TI 201) 0)) 135 {*movdi_64bit} (nil)) (insn 154 153 854 5 (set (subreg:DI (reg:TI 77 [ _8 ]) 8) (subreg:DI (reg:TI 201) 8)) 135 {*movdi_64bit} (expr_list:REG_DEAD (reg:TI 201) (nil))) Then the register allocator puts both 77 and 201 in the same register, which means we are now clobbering values we need. In the reload dump I see (insn 851 152 849 5 (clobber (reg:TI 16 a6 [201])) -1 (nil)) (insn 849 851 850 5 (set (reg:DI 16 a6 [201]) (reg:DI 10 a0 [orig:77 _8 ] [77])) 135 {*movdi_64bit} (nil)) (insn 850 849 907 5 (set (reg:DI 17 a7 [+8 ]) (reg:DI 11 a1 [ _8+8 ])) 135 {*movdi_64bit} (nil)) (insn 907 850 1014 5 (clobber (reg:TI 16 a6 [201])) -1 (nil)) so the insns 849 and 850 get optimized away, but we need them. Also, we have (insn 854 154 852 5 (clobber (reg:TI 16 a6 [202])) -1 (nil)) (insn 852 854 853 5 (set (reg:DI 16 a6 [202]) (reg:DI 6 t1 [orig:86 _39 ] [86])) 135 {*movdi_64bit} (nil)) (insn 853 852 913 5 (set (reg:DI 17 a7 [+8 ]) (reg:DI 7 t2 [ _39+8 ])) 135 {*movdi_64bit} (nil)) (insn 913 853 1010 5 (clobber (reg:TI 16 a6 [202])) -1 (nil)) and the insns 852 and 853 get optimized away, but we need them. The comparison is supposed to be a0/a1 versus t1/t2, but we end up with comparing a6/a7 against itself. asmcons is calling emit_move_insn to copy the asm source to the asm dest so it can simplify the asm. Since this is a multiword mode, and the riscv backend doesn't have a movti pattern, this ends up calling emit_move_multi_word which emits the extra clobber that causes the problem. I suppose we could fix this by adding a movti pattern to the riscv backend to avoid the clobbers but we shouldn't have to. Though it would be interesting to see if this maybe results in better code optimization. It isn't clear exactly where the problem is. Maybe asmcons shouldn't try to fix an asm when the mode is larger than the word mode? This could be left to the register allocator to fix. Or maybe harden compare shouldn't generate RTL like this? This could be a harden compare issue, or maybe an issue with the RTL expander to emit the rtl differently. Looks like the same issue with the RTL expander calling emit_move_multi_word which generates the clobber. Or maybe a movti pattern is actually required now? I did verify that disabling asmcons fixes the problem for this testcase. I had to hack the code in function.c to do that as there is no option to disable it.
[Bug ipa/103432] [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432 Andrew Pinski changed: What|Removed |Added Keywords||wrong-code Target Milestone|--- |12.0 CC||marxin at gcc dot gnu.org Component|tree-optimization |ipa Known to work||11.2.0
[Bug tree-optimization/103432] New: [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103432 Bug ID: 103432 Summary: [12 regression] libjxl-0.5 is miscompiled, works fine with -fno-ipa-modref Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: slyfox at gcc dot gnu.org Target Milestone: --- Created attachment 51875 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51875=edit dct_test.cc Originally noticed a problem as failed tests on libjxl-0.5. I extracted ~10KB self-contained single-file example. It still could be reduced, but it's quite tangled. Could you see what is obviously wrong with it? Attached the reproducer as dct_test.cc: $ g++-12.0.0 -std=c++11 -O2 -fno-tree-vectorize dct_test.cc -o dct_test $ g++-12.0.0 -std=c++11 -O2 -fno-tree-vectorize -fno-ipa-modref dct_test.cc -o dct_test1 # good: $ ./dct_test1 OK: Good accuracy: exp=0.01 act=0.00 OK: Good accuracy: exp=0.01 act=0.00 # bad: $ ./dct_test OK: Good accuracy: exp=0.01 act=0.00 ERROR: Too low accuracy: exp=0.01 act=1.00 $ g++-12.0.0 -v Using built-in specs. COLLECT_GCC=/nix/store/2lxwqh3k88x4jwyfwlsfnwrp78yq2ah2-gcc-12.0.0/bin/g++ COLLECT_LTO_WRAPPER=/nix/store/2lxwqh3k88x4jwyfwlsfnwrp78yq2ah2-gcc-12.0.0/libexec/gcc/x86_64-unknown-linux-gnu/12.0.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: Thread model: posix Supported LTO compression algorithms: zlib gcc version 12.0.0 20211121 (experimental) (GCC)
[Bug c++/92385] extremely long and memory intensive compilation for brace construction of array member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92385 Andrew Pinski changed: What|Removed |Added CC||beyondstandard at gmail dot com --- Comment #8 from Andrew Pinski --- *** Bug 71165 has been marked as a duplicate of this bug. ***
[Bug c++/71165] std::array with aggregate initialization generates huge code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71165 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #4 from Andrew Pinski --- Dup of bug 92385. *** This bug has been marked as a duplicate of bug 92385 ***
[Bug c++/92385] extremely long and memory intensive compilation for brace construction of array member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92385 Andrew Pinski changed: What|Removed |Added CC||hehaochen at hotmail dot com --- Comment #7 from Andrew Pinski --- *** Bug 94957 has been marked as a duplicate of this bug. ***
[Bug c++/94957] Compilation slowww for simple code with big array of structs with constructors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94957 Andrew Pinski changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |DUPLICATE --- Comment #6 from Andrew Pinski --- Dup of bug 92385. *** This bug has been marked as a duplicate of bug 92385 ***
[Bug c++/94957] Compilation slowww for simple code with -O1/2/3 and -g in GCC 8 and 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94957 Andrew Pinski changed: What|Removed |Added CC||ilord.tiran at yandex dot ru --- Comment #5 from Andrew Pinski --- *** Bug 98547 has been marked as a duplicate of this bug. ***
[Bug c++/98547] GCC spends many minutes instead of seconds building a file with array initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98547 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andrew Pinski --- Yes this is a dup of bug 94957. *** This bug has been marked as a duplicate of bug 94957 ***
[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418 --- Comment #8 from Steve Kargl --- On Thu, Nov 25, 2021 at 02:18:46PM -0800, Steve Kargl wrote: > On Thu, Nov 25, 2021 at 10:10:32PM +, anlauf at gcc dot gnu.org wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418 > > > > --- Comment #6 from anlauf at gcc dot gnu.org --- > > Unfortunately the patch in comment#5 does not work for me. :-( > > > > Interestingly, the Intel compiler fails on the testcase, too. > > > > Hmmm. I did have a number of other patches in my tree. I > wonder if one of those helped. Unfortunately, I updated > my git repository, where I cleared out all patch, and it > takes a long time to rebuild gcc on my laptop. > For the record, module test implicit none contains subroutine change_pointer_target(ptr) real, pointer, intent(in) :: ptr(:) call random_number(ptr) ptr(:) = ptr + 1.0 end subroutine change_pointer_target end module test program foo use test implicit none real, pointer :: a(:), b allocate(a(4), b) call random_number(b) call random_number(a) print '(5F8.5)', b, a end program foo % gfcx -o z a.f90 && ./z 0.65287 0.82614 0.77541 0.61923 0.52961
[Bug c/98487] ICE: tree check: expected identifier_node, have tree_list in is_attribute_p, at attribs.h:155 [C2X attribute syntax, gnu::format and -Wsuggest-attribute=format]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98487 Andrew Pinski changed: What|Removed |Added Keywords||ice-checking Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-11-25 --- Comment #2 from Andrew Pinski --- Confirmed. Simplier testcase: #include [[gnu::__format__(__printf__, 1, 2)]] void do_printf(const char * const a0, ...) { va_list ap; va_start(ap, a0); __builtin_vprintf(a0, ap); va_end(ap); } [[gnu::__format__(__scanf__, 1, 2)]] void do_scanf(const char * const a0, ...) { va_list ap; va_start(ap, a0); __builtin_vscanf(a0, ap); va_end(ap); } [[gnu::__format__(__strftime__, 1, 0)]] void do_strftime(const char * const a0, struct tm * a1) { char buff[256]; __builtin_strftime(buff, sizeof(buff), a0, a1); puts(buff); }
[committed] libstdc++: Remove dg-error that no longer happens
Tested x86_64-linux, pushed to trunk. There was a c++11_only dg-error in this testcase, for a "body of constexpr function is not a return statement" diagnostic that was bogus, but happened because the return statement was ill-formed. A change to G++ earlier this month means that diagnostic is no longer emitted, so remove the dg-error. libstdc++-v3/ChangeLog: * testsuite/20_util/tuple/comparison_operators/overloaded2.cc: Remove dg-error for C++11_only error. --- .../testsuite/20_util/tuple/comparison_operators/overloaded2.cc | 1 - 1 file changed, 1 deletion(-) diff --git a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc index bac16ffd521..6a7a584c71e 100644 --- a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc +++ b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc @@ -52,4 +52,3 @@ auto b = a < a; // { dg-error "no match for 'operator<'" "" { target c++20 } 0 } // { dg-error "no match for .*_Synth3way|in requirements" "" { target c++20 } 0 } // { dg-error "ordered comparison" "" { target c++17_down } 0 } -// { dg-error "not a return-statement" "" { target c++11_only } 0 } -- 2.31.1
[committed] libstdc++: Make std::pointer_traits SFINAE-friendly [PR96416]
Tested x86_64-linux, pushed to trunk. This implements the resolution I'm proposing for LWG 3545, to avoid hard errors when using std::to_address for types that make pointer_traits ill-formed. Consistent with std::iterator_traits, instantiating std::pointer_traits for a non-pointer type will be well-formed, but give an empty type with no member types. This avoids the problematic cases for std::to_address. Additionally, the pointer_to member is now only declared when the element type is not cv void (and for C++20, when the function body would be well-formed). The rebind member was already SFINAE-friendly in our implementation. libstdc++-v3/ChangeLog: PR libstdc++/96416 * include/bits/ptr_traits.h (pointer_traits): Reimplement to be SFINAE-friendly (LWG 3545). * testsuite/20_util/pointer_traits/lwg3545.cc: New test. * testsuite/20_util/to_address/1_neg.cc: Adjust dg-error line. * testsuite/20_util/to_address/lwg3545.cc: New test. --- libstdc++-v3/include/bits/ptr_traits.h| 167 +- .../20_util/pointer_traits/lwg3545.cc | 120 + .../testsuite/20_util/to_address/1_neg.cc | 2 +- .../testsuite/20_util/to_address/lwg3545.cc | 12 ++ 4 files changed, 251 insertions(+), 50 deletions(-) create mode 100644 libstdc++-v3/testsuite/20_util/pointer_traits/lwg3545.cc create mode 100644 libstdc++-v3/testsuite/20_util/to_address/lwg3545.cc diff --git a/libstdc++-v3/include/bits/ptr_traits.h b/libstdc++-v3/include/bits/ptr_traits.h index 115b86d43e4..4987fa9942f 100644 --- a/libstdc++-v3/include/bits/ptr_traits.h +++ b/libstdc++-v3/include/bits/ptr_traits.h @@ -35,6 +35,7 @@ #include #if __cplusplus > 201703L +#include #define __cpp_lib_constexpr_memory 201811L namespace __gnu_debug { struct _Safe_iterator_base; } #endif @@ -45,55 +46,119 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION class __undefined; - // Given Template return T, otherwise invalid. + // For a specialization `SomeTemplate` the member `type` is T, + // otherwise `type` is `__undefined`. template struct __get_first_arg { using type = __undefined; }; - template class _Template, typename _Tp, + template class _SomeTemplate, typename _Tp, typename... _Types> -struct __get_first_arg<_Template<_Tp, _Types...>> +struct __get_first_arg<_SomeTemplate<_Tp, _Types...>> { using type = _Tp; }; - template -using __get_first_arg_t = typename __get_first_arg<_Tp>::type; - - // Given Template and U return Template, otherwise invalid. + // For a specialization `SomeTemplate` and a type `U` the member + // `type` is `SomeTemplate`, otherwise there is no member `type`. template struct __replace_first_arg { }; - template class _Template, typename _Up, + template class _SomeTemplate, typename _Up, typename _Tp, typename... _Types> -struct __replace_first_arg<_Template<_Tp, _Types...>, _Up> -{ using type = _Template<_Up, _Types...>; }; +struct __replace_first_arg<_SomeTemplate<_Tp, _Types...>, _Up> +{ using type = _SomeTemplate<_Up, _Types...>; }; - template -using __replace_first_arg_t = typename __replace_first_arg<_Tp, _Up>::type; - - template -using __make_not_void - = __conditional_t::value, __undefined, _Tp>; - - /** - * @brief Uniform interface to all pointer-like types - * @ingroup pointer_abstractions - */ +#if __cpp_concepts + // When concepts are supported detection of _Ptr::element_type is done + // by a requires-clause, so __ptr_traits_elem_t only needs to do this: template -struct pointer_traits +using __ptr_traits_elem_t = typename __get_first_arg<_Ptr>::type; +#else + // Detect the element type of a pointer-like type. + template +struct __ptr_traits_elem : __get_first_arg<_Ptr> +{ }; + + // Use _Ptr::element_type if is a valid type. + template +struct __ptr_traits_elem<_Ptr, __void_t> +{ using type = typename _Ptr::element_type; }; + + template +using __ptr_traits_elem_t = typename __ptr_traits_elem<_Ptr>::type; +#endif + + // Define pointer_traits::pointer_to. + template::value> +struct __ptr_traits_ptr_to +{ + using pointer = _Ptr; + using element_type = _Elt; + + /** + * @brief Obtain a pointer to an object + * @param __r A reference to an object of type `element_type` + * @return `pointer::pointer_to(__e)` + * @pre `pointer::pointer_to(__e)` is a valid expression. + */ + static pointer + pointer_to(element_type& __e) +#if __cpp_lib_concepts + requires requires { + { pointer::pointer_to(__e) } -> convertible_to; + } +#endif + { return pointer::pointer_to(__e); } +}; + + // Do not define pointer_traits::pointer_to if element type is void. + template +struct __ptr_traits_ptr_to<_Ptr, _Elt, true> +{ }; + + // Partial specialization defining pointer_traits::pointer_to(T&). + template +
[Bug libstdc++/96416] [DR 3545] to_address() is broken by static_assert in pointer_traits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96416 --- Comment #21 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:b8018e5c5ec0e9b6948182f13fba47c67b758d8a commit r12-5532-gb8018e5c5ec0e9b6948182f13fba47c67b758d8a Author: Jonathan Wakely Date: Thu Nov 25 16:49:45 2021 + libstdc++: Make std::pointer_traits SFINAE-friendly [PR96416] This implements the resolution I'm proposing for LWG 3545, to avoid hard errors when using std::to_address for types that make pointer_traits ill-formed. Consistent with std::iterator_traits, instantiating std::pointer_traits for a non-pointer type will be well-formed, but give an empty type with no member types. This avoids the problematic cases for std::to_address. Additionally, the pointer_to member is now only declared when the element type is not cv void (and for C++20, when the function body would be well-formed). The rebind member was already SFINAE-friendly in our implementation. libstdc++-v3/ChangeLog: PR libstdc++/96416 * include/bits/ptr_traits.h (pointer_traits): Reimplement to be SFINAE-friendly (LWG 3545). * testsuite/20_util/pointer_traits/lwg3545.cc: New test. * testsuite/20_util/to_address/1_neg.cc: Adjust dg-error line. * testsuite/20_util/to_address/lwg3545.cc: New test.
[Bug libstdc++/101608] ranges::fill/fill_n missing std::is_constant_evaluated() condition for __builtin_memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101608 --- Comment #3 from CVS Commits --- The releases/gcc-11 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:7ae6e4e3831429d20eea1be285dbc6a4a005930f commit r11-9314-g7ae6e4e3831429d20eea1be285dbc6a4a005930f Author: Jonathan Wakely Date: Wed Nov 24 13:17:54 2021 + libstdc++: Do not use memset in constexpr calls to ranges::fill_n [PR101608] libstdc++-v3/ChangeLog: PR libstdc++/101608 * include/bits/ranges_algobase.h (__fill_n_fn): Check for constant evaluation before using memset. * testsuite/25_algorithms/fill_n/constrained.cc: Check byte-sized values as well. (cherry picked from commit 82c3657dd74896b39937bb0a2aaeba9b8ca105fd)
[Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393 --- Comment #14 from H.J. Lu --- (In reply to Richard Earnshaw from comment #13) > Also, note that the comment in gimple-fold.c prior to this change read: > > /* If we can perform the copy efficiently with first doing all loads > and then all stores inline it that way. Currently efficiently > means that we can load all the memory into a single integer > register which is what MOVE_MAX gives us. */ > > Which would imply that the AArch64 definition of MOVE_MAX is the correct one. The GCC manual has - Macro: MOVE_MAX The maximum number of bytes that a single instruction can move quickly between memory and registers or between two memory locations.
[PATCH] x86: Add -mmove-max=bits and -mstore-max=bits
Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move and store, independent of -mprefer-vector-width=bits: 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES which are enabled for Intel Sapphire Rapids processor. 2. Add -mmove-max=bits to set the maximum number of bits can be moved from memory to memory efficiently. The default value is derived from X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the preferred vector width. 3. Add -mstore-max=bits to set the maximum number of bits can be stored to memory efficiently. The default value is derived from X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the preferred vector width. gcc/ PR target/103269 * config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE and PVW_NONE to ix86_target_string. * config/i386/i386-options.c (ix86_target_string): Add arguments for move_max and store_max. (ix86_target_string::add_vector_width): New lambda. (ix86_debug_options): Pass ix86_move_max and ix86_store_max to ix86_target_string. (ix86_function_specific_print): Pass ptr->x_ix86_move_max and ptr->x_ix86_store_max to ix86_target_string. (ix86_valid_target_attribute_tree): Handle x_ix86_move_max and x_ix86_store_max. (ix86_option_override_internal): Set the default x_ix86_move_max and x_ix86_store_max. * config/i386/i386-options.h (ix86_target_string): Add prefer_vector_width and prefer_vector_width. * config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed. (TARGET_AVX256_STORE_BY_PIECES): Likewise. (MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max == PVW_AVX512. Use 32 if ix86_move_max or ix86_store_max >= PVW_AVX256. (STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512. Use 32 if ix86_store_max >= PVW_AVX256. * config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits. * config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New. (X86_TUNE_AVX512_STORE_BY_PIECES): Likewise. * doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits. gcc/testsuite/ PR target/103269 * gcc.target/i386/pieces-memcpy-17.c: New test. * gcc.target/i386/pieces-memcpy-18.c: Likewise. * gcc.target/i386/pieces-memcpy-19.c: Likewise. * gcc.target/i386/pieces-memcpy-20.c: Likewise. * gcc.target/i386/pieces-memcpy-21.c: Likewise. * gcc.target/i386/pieces-memset-45.c: Likewise. * gcc.target/i386/pieces-memset-46.c: Likewise. * gcc.target/i386/pieces-memset-47.c: Likewise. * gcc.target/i386/pieces-memset-48.c: Likewise. * gcc.target/i386/pieces-memset-49.c: Likewise. --- gcc/config/i386/i386-expand.c | 1 + gcc/config/i386/i386-options.c| 75 +-- gcc/config/i386/i386-options.h| 6 +- gcc/config/i386/i386.h| 18 ++--- gcc/config/i386/i386.opt | 8 ++ gcc/config/i386/x86-tune.def | 10 +++ gcc/doc/invoke.texi | 13 .../gcc.target/i386/pieces-memcpy-17.c| 16 .../gcc.target/i386/pieces-memcpy-18.c| 16 .../gcc.target/i386/pieces-memcpy-19.c| 16 .../gcc.target/i386/pieces-memcpy-20.c| 16 .../gcc.target/i386/pieces-memcpy-21.c| 16 .../gcc.target/i386/pieces-memset-45.c| 16 .../gcc.target/i386/pieces-memset-46.c| 17 + .../gcc.target/i386/pieces-memset-47.c| 17 + .../gcc.target/i386/pieces-memset-48.c| 17 + .../gcc.target/i386/pieces-memset-49.c| 16 17 files changed, 276 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-45.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-46.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-47.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-48.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-49.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 0d5d1a0e205..7e77ff56ddc 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -12295,6 +12295,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget, char *opts = ix86_target_string (bisa, bisa2, 0, 0, NULL, NULL, (enum fpmath_unit) 0,
[Bug tree-optimization/98304] Failure to optimize bitwise arithmetic pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98304 --- Comment #2 from Andrew Pinski --- > @1 == (@2)-1 Should have been: @1 == -(@2-1) maybe check that @1 is a mask.
gcc-9-20211125 is now available
Snapshot gcc-9-20211125 is now available on https://gcc.gnu.org/pub/gcc/snapshots/9-20211125/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 9 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-9 revision 3d1f5e86fb4351a109d45fe441b1b00d6e56c277 You'll find: gcc-9-20211125.tar.xzComplete GCC SHA256=8e9f79a98e8fffa14dc98ea6731b2e18fb7016a36f4d21d28d5e81575ffdcee2 SHA1=6ff9b4788c37ab0ae4640116f7103b5304564f72 Diffs from 9-2028 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
[Bug tree-optimization/98304] Failure to optimize bitwise arithmetic pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98304 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-11-25 Severity|normal |enhancement Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- _1 = MAX_EXPR ; _2 = _1 & -64; _4 = n_3(D) - _2; Something like: (simplify (minus @0 (bit_and (max @0 INTEGER_CST@1) INTEGER_CST@2)) (if (@1 == (@2)-1) (if (TYPE_SIGN (type) == UNSIGNED) (bit_and @0 @1) (cond (le @0 @1) @0 (bit_and @0 @1)) ) ) ) Note LLVM handles the unsigned case already. Also note also even though GCC can handle the loop case for signed, it only handles it on the RTL level, for gimple GCC produces: _3 = n_2(D) + -64; _8 = (unsigned int) n_2(D); _9 = _8 + 4294967232; // _9 = _3 - 64 _10 = _9 >> 6; // _10 = _9/64 _11 = (int) _10; _12 = _11 * -64; n_1 = _3 + _12;
[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-3903-g0288527f47cec669
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409 --- Comment #6 from hubicka at kam dot mff.cuni.cz --- > Started with r12-3903-g0288527f47cec669. This is September change (for which we have PR102943) however the regression range was g:1ae8edf5f73ca5c3 (or g:264f061997c0a534 on second plot) and g:3e09331f6aeaf595 which is the latest regression visible on the graphs appearing betwen Nov 12 and Nov 15. The September regression is there too, but it is tracket as PR102943
[Bug rtl-optimization/79048] Unnecessary reload for flags setting insn when operands die
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79048 Roger Sayle changed: What|Removed |Added Target Milestone|--- |12.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED CC||roger at nextmovesoftware dot com --- Comment #2 from Roger Sayle --- This issue appears to be fixed on mainline. The test case now generates: f1: orb %dil, %sil jne .L4 ret
[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418 --- Comment #7 from Steve Kargl --- On Thu, Nov 25, 2021 at 10:10:32PM +, anlauf at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418 > > --- Comment #6 from anlauf at gcc dot gnu.org --- > Unfortunately the patch in comment#5 does not work for me. :-( > > Interestingly, the Intel compiler fails on the testcase, too. > Hmmm. I did have a number of other patches in my tree. I wonder if one of those helped. Unfortunately, I updated my git repository, where I cleared out all patch, and it takes a long time to rebuild gcc on my laptop.
[Bug tree-optimization/103423] [12 Regression] 19% cpu2006 wrf compile time regression with -flto since r12-3903-g0288527f47cec669
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103423 --- Comment #1 from hubicka at kam dot mff.cuni.cz --- Martin, My original report here was on regression at July 17 2021 (range g:0b7a11874d4eb428 and g:704e8a825c78b9a8) which seems unrelated to g:r12-3903-g0288527f47cec669 which is in Sep 21 2021 I think we are mixing up the cpu2006 and cpu2017 wrf's that seems to regress on different times. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103423 > > Martin Liška changed: > >What|Removed |Added > >See Also||https://gcc.gnu.org/bugzill >||a/show_bug.cgi?id=103409 > > -- > You are receiving this mail because: > You reported the bug.
[Bug c++/98030] error message for enum definition without ';' could be improved to include a fixit note
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98030 Andrew Pinski changed: What|Removed |Added Summary|error message for enum |error message for enum |definition without ';' |definition without ';' |could be improved |could be improved to ||include a fixit note Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2021-11-25 Severity|normal |enhancement --- Comment #3 from Andrew Pinski --- Confirmed.
[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418 --- Comment #6 from anlauf at gcc dot gnu.org --- Unfortunately the patch in comment#5 does not work for me. :-( Interestingly, the Intel compiler fails on the testcase, too.
[PATCH, v2] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377
Hi Mikael, Am 25.11.21 um 22:02 schrieb Mikael Morin: Le 25/11/2021 à 21:03, Harald Anlauf a écrit : Hi Mikael, Am 25.11.21 um 17:46 schrieb Mikael Morin: Hello, Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit : diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 5a5aca10ebe..837eb0912c0 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape, { gfc_constructor *c; bool test; + gfc_constructor_base b; + if (shape->expr_type == EXPR_ARRAY) + b = shape->value.constructor; + else if (shape->expr_type == EXPR_VARIABLE) + b = shape->symtree->n.sym->value->value.constructor; This misses a check that shape->symtree->n.sym->value is an array, so that it makes sense to access its constructor. there are checks further above for the cases shape->expr_type == EXPR_ARRAY and for shape->expr_type == EXPR_VARIABLE which look at the elements of array shape to see if they are non-negative. Only in those cases where the full "if ()'s" pass we set shape_is_const = true; and proceed. The purpose of the auxiliary bool shape_is_const is to avoid repeating the lengthy if's again. Only then the above cited code segment should get executed. For shape->expr_type == EXPR_ARRAY there is really no change in logic. For shape->expr_type == EXPR_VARIABLE the above snipped is now executed, but then we already had else if (shape->expr_type == EXPR_VARIABLE && shape->ref && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1 && shape->ref->u.ar.as && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER && shape->symtree->n.sym->attr.flavor == FL_PARAMETER && shape->symtree->n.sym->value) In which situations do I miss anything new? Yes, I agree with all of this. My comment wasn’t about a check on shape->expr_type, but on shape->value->expr_type if shape->expr_type is a (parameter) variable. Actually, this only supports the case where the parameter value is defined by an array; but it could be an intrinsic call, a sum of parameters, a reference to an other parameter, etc. E.g. the following (still) does get rejected: print *, reshape([1,2,3,4,5], a+1) print *, reshape([1,2,3,4,5], a+a) print *, reshape([1,2,3,4,5], 2*a) print *, reshape([1,2,3,4,5], [3,3]) print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2)) and has been rejected before. The usual way to handle this is to call gfc_reduce_init_expr which (pray for it) will make an array out of whatever the shape expression is. Can you give an example where it fails? I think the current code would almost certainly fail, too. Probably, I was just trying to avoid followup bugs. ;-) I have checked the following: integer, parameter :: a(2) = [1,1] integer, parameter :: b(2) = a + 1 print *, reshape([1,2,3,4], b) end and it doesn’t fail as I thought it would. well, that one is actually better valid, since b=[2,2]. So yes, I was wrong; b has been expanded to an array before. Motivated by your reasoning I tried gfc_reduce_init_expr. That attempt failed miserably (many regressions), and I think it is not right. Then I found that array sections posed a problem that wasn't detected before. gfc_simplify_expr seemed to be a better choice that makes more sense for the present situations and seems to work here. And it even detects many more invalid cases now than e.g. Intel ;-) I've updated the patch and testcase accordingly. Can you add an assert or a comment saying that the parameter value has been expanded to a constant array? Ok with that change. Given the above discussion, I'll give you another day or two to have a further look. Otherwise Gerhard will... ;-) Cheers, Harald From 56fd0d23ac0a5bda802e5cce3024b947e497555a Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Thu, 25 Nov 2021 22:39:44 +0100 Subject: [PATCH] Fortran: improve check of arguments to the RESHAPE intrinsic gcc/fortran/ChangeLog: PR fortran/103411 * check.c (gfc_check_reshape): Improve check of size of source array for the RESHAPE intrinsic against the given shape when pad is not given, and shape is a parameter. Try other simplifications of shape. gcc/testsuite/ChangeLog: PR fortran/103411 * gfortran.dg/pr68153.f90: Adjust test to improved check. * gfortran.dg/reshape_7.f90: Likewise. * gfortran.dg/reshape_9.f90: New test. --- gcc/fortran/check.c | 22 +- gcc/testsuite/gfortran.dg/pr68153.f90 | 2 +- gcc/testsuite/gfortran.dg/reshape_7.f90 | 2 +- gcc/testsuite/gfortran.dg/reshape_9.f90 | 24 4 files changed, 43 insertions(+), 7 deletions(-) create mode 100644
Re: [EXTERNAL] Re: Question about match.pd
> (A << B) eq/ne 0 Yes that is correct. But for detecting such pattern you You have to detect B and make sure B is boolean. GIMPLE transfers that Boolean to integer before shifting. After many hours of debugging, I think I managed to find out what is going on. +/* cmp : ==, != */ +/* ((B0 << x) cmp 0) -> B0 cmp 0 */ +(for cmp (eq ne) + (simplify + (cmp (lshift (convert@3 boolean_valued_p@0) @1) integer_zerop@2) + (if (TREE_CODE (TREE_TYPE (@3)) == INTEGER_TYPE + && (GIMPLE || !TREE_SIDE_EFFECTS (@1))) +(cmp @0 @2 So when I am transforming something like above pattern to (cmp @0 @2) there is a type mismatch between @0 and @2. @0 is boolean and @2 is integer. That type mismatch does cause a lot of headache when going through resimplification. Best wishes, Navid. From: Jeff Law Sent: Wednesday, November 24, 2021 15:11 To: Navid Rahimi; gcc@gcc.gnu.org Subject: [EXTERNAL] Re: Question about match.pd On 11/24/2021 2:19 PM, Navid Rahimi via Gcc wrote: > Hi GCC community, > > I have a question about pattern matching in match.pd. > > So I have a pattern like this [1]: > #define CMP != > bool f(bool c, int i) { return (c << i) CMP 0; } > bool g(bool c, int i) { return c CMP 0;} > > It is verifiably correct to transfer f to g [2]. Although this pattern looks > simple, but the problem rises because GIMPLE converts booleans to int before > "<<" operation. > So at the end you have boolean->integer->boolean conversion and the shift > will happen on the integer in the middle. > > For example, for something like: > > bool g(bool c){return (c << 22);} > > The GIMPLE is: > _Bool g (_Bool c) > { >int _1; >int _2; >_Bool _4; > > [local count: 1073741824]: >_1 = (int) c_3(D); >_2 = _1 << 22; >_4 = _2 != 0; >return _4; > } > > I wrote a patch to fix this problem in match.pd: > > +(match boolean_valued_p > + @0 > + (if (TREE_CODE (type) == BOOLEAN_TYPE > + && TYPE_PRECISION (type) == 1))) > +(for op (tcc_comparison truth_and truth_andif truth_or truth_orif truth_xor) > + (match boolean_valued_p > + (op @0 @1))) > +(match boolean_valued_p > + (truth_not @0)) > > +/* cmp : ==, != */ > +/* ((B0 << x) cmp 0) -> B0 cmp 0 */ > +(for cmp (eq ne) > + (simplify > + (cmp (lshift (convert@3 boolean_valued_p@0) @1) integer_zerop@2) > + (if (TREE_CODE (TREE_TYPE (@3)) == INTEGER_TYPE > + && (GIMPLE || !TREE_SIDE_EFFECTS (@1))) > +(cmp @0 @2 > > > But the problem is I am not able to restrict to the cases I am interested in. > There are many hits in other libraries I have tried compiling with > trunk+patch. > > Any feedback? > > 1) > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D98956data=04%7C01%7Cnavidrahimi%40microsoft.com%7Caa8c9c8213a245c7ae9d08d9af9fc8ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733923073627850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=25KlLcsftTmN83rVawoKKaTPJdCdFlmtXMj%2BwsrKWbo%3Dreserved=0 > 2) > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2FUUTJ_vdata=04%7C01%7Cnavidrahimi%40microsoft.com%7Caa8c9c8213a245c7ae9d08d9af9fc8ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733923073637846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=fwN9%2BB0VObPyuUS2fOtj14i%2BHJIiRhyyjZM4LOF4AP8%3Dreserved=0 It would help to also see the cases you're triggering that you do not want to trigger. Could we think of the optimization opportunity in a different way? (A << B) eq/ne 0 -> A eq/ne (0U >> B) And I would expect the 0U >> B to get simplified to 0. Would looking at things that way help? jeff
[Bug middle-end/103431] [12 Regression] wrong code with -O -fno-tree-bit-ccp -fno-tree-dominator-opts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103431 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Component|rtl-optimization|middle-end Last reconfirmed||2021-11-25 --- Comment #1 from Andrew Pinski --- Confirmed. reduced testcase (removing the globals): typedef unsigned __int128 B; __attribute__((noipa)) void f(unsigned short a) { B b = 5; int size = (sizeof(b)*8)-1; a /= 0xfffd; B b1 = (b << (a & size) | b >> (-(a & size) & size)); if (b1 != 5) __builtin_abort (); } int main (void) { f(0); } - CUT --- The gimple level does not change. In GCC 11 and the trunk, we have: _1 = (unsigned intD.9) a_8(D); _2 = _1 / 4294967293; a_9 = (short unsigned intD.18) _2; _13 = a_9 & 127; _3 = (intD.6) _13; b1_10 = 5 r<< _3; if (b1_10 != 5) It looks like the expansion from gimple to RTL of the rotate is different between the two versions.
Re: libstdc++: Make atomic::wait() const [PR102994]
On Wed, 24 Nov 2021 at 01:27, Thomas Rodgers wrote: > > const qualification was also missing in the free functions for > wait/wait_explicit/notify_one/notify_all. Revised patch attached. Please tweak the whitespace in the new test: > +test1(const std::atomic , char*p) The '&' should be on the type not the variable, and there should be a space before 'p': > +test1(const std::atomic& a, char* p) OK for trunk and gcc-11 with that tweak, thanks!
Re: [PATCH v7] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]
Hi! On Wed, Nov 24, 2021 at 08:48:47PM -0300, Raoni Fassina Firmino wrote: > gcc/ChangeLog: > * builtins.c (expand_builtin_fegetround): New function. > (expand_builtin_feclear_feraise_except): New function. > (expand_builtin): Add cases for BUILT_IN_FEGETROUND, > BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT Something is missing here (maybe just a full stop?) > * config/rs6000/rs6000.md (fegetroundsi): New pattern. > (feclearexceptsi): New Pattern. > (feraiseexceptsi): New Pattern. > * doc/extend.texi: Add a new introductory paragraph about the > new builtins. Pet peeve: please don't break lines early, we have only 72 columns per line and we have many long symbol names. Trying to make many lines very short only results in everything looking very irregular, which is harder to read. > * doc/md.texi: (fegetround@var{m}): Document new optab. > (feclearexcept@var{m}): Document new optab. > (feraiseexcept@var{m}): Document new optab. > * optabs.def (fegetround_optab): New optab. > (feclearexcept_optab): New optab. > (feraiseexcept_optab): New optab. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New > test. > * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New > test. > * gcc.target/powerpc/builtin-fegetround.c: New test. > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6860,6 +6860,117 @@ >[(set_attr "type" "fpload") > (set_attr "length" "8") > (set_attr "isa" "*,p8v,p8v")]) > + > +;; int fegetround(void) > +;; > +;; This expansion for the C99 function only expands for compatible > +;; target libcs. Because it needs to return one of FE_DOWNWARD, > +;; FE_TONEAREST, FE_TOWARDZERO or FE_UPWARD with the values as defined > +;; by the target libc, and since they are free to > +;; choose the values and the expand needs to know then beforehand, > +;; this expand only expands for target libcs that it can handle the > +;; values is knows. > +;; Because of these restriction, this only expands on the desired > +;; case and fallback to a call to libc on any otherwise. > +(define_expand "fegetroundsi" (This needs some wordsmithing.) > +;; int feclearexcept(int excepts) > +;; > +;; This expansion for the C99 function only works when EXCEPTS is a > +;; constant known at compile time and specifies any one of > +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags. > +;; It doesn't handle values out of range, and always returns 0. It FAILs the expansion if a parameter is bad? Is this comment out of date? > +;; Note that FE_INVALID is unsupported because it maps to more than > +;; one bit of the FPSCR register. It could be implemented, now that you check for the libc used. It is a fixed part of the ABI :-) > +;; The FE_* are defined in the targed libc, and since they are free to > +;; choose the values and the expand needs to know then beforehand, s/then/them/ > +;; this expand only expands for target libcs that it can handle the (this expander) > +;; values is knows. s/is/it/ > +/* This testcase ensures that the builtins expand with the matching arguments > + * or otherwise fallback gracefully to a function call, and don't ICE during > + * compilation. > + * "-fno-builtin" option is used to enable calls to libc implementation of > the > + * gcc builtins tested when not using __builtin_ prefix. */ Don't use leading * in comments, btw. This is a testcase so anything goes, but FYI :-) > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c > + int i, rounding, expected; > + const int rm[] = {FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD}; > + for (i = 0; i < sizeof(rm); i++) That should be sizeof rm / sizeof rm[0] ? It accesses out of bounds as it is. Maybe test more values? At least 0, but also combinations of these FE_ bits, and maybe even FE_INVALID? With such changes the rs6000 parts are okay for trunk. Thanks! I looked at the generic changes as well, and they all look fine to me. Segher
[Bug rtl-optimization/103431] New: [12 Regression] wrong code with -O -fno-tree-bit-ccp -fno-tree-dominator-opts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103431 Bug ID: 103431 Summary: [12 Regression] wrong code with -O -fno-tree-bit-ccp -fno-tree-dominator-opts Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zsojka at seznam dot cz Target Milestone: --- Host: x86_64-pc-linux-gnu Created attachment 51874 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51874=edit reduced testcase Output: $ x86_64-pc-linux-gnu-gcc -O -fno-tree-bit-ccp -fno-tree-dominator-opts testcase.c $ ./a.out Aborted $ x86_64-pc-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64//bin/x86_64-pc-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r12-5528-20211125184355-g9488d242066-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r12-5528-20211125184355-g9488d242066-checking-yes-rtl-df-extra-nobootstrap-amd64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.0.0 20211125 (experimental) (GCC)
[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418 --- Comment #5 from Steve Kargl --- On Thu, Nov 25, 2021 at 09:02:34PM +, anlauf at gcc dot gnu.org wrote: > (In reply to kargl from comment #3) > > (In reply to anlauf from comment #2) > > > The nearly obvious fix: > > > > > > diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c > > > index 837eb0912c0..3859e18c6c3 100644 > > > --- a/gcc/fortran/check.c > > > +++ b/gcc/fortran/check.c > > > @@ -1031,7 +1031,7 @@ variable_check (gfc_expr *e, int n, bool allow_proc) > > > break; > > > } > > > > > > - if (!ref) > > > + if (!ref && !pointer) > > > { > > > gfc_error ("%qs argument of %qs intrinsic at %L cannot be " > > > "INTENT(IN)", gfc_current_intrinsic_arg[n]->name, > > > > > > regresses for gfortran.dg/move_alloc_8.f90, thus needs additional > > > investigation. > > > > Did you try the patch posted in Fortran Discourse? > > No. > > I'm afraid I also missed it on the usual channels where patches for gcc > are posted. > As explained on FD, I don't report problems found be other people who post them in FD, stackoverflow, or c.l.f. I encourage those people to report the problems themselves. That said, you found the right location to patch. The code looks convoluted to deal with CLASS, which messes up an array with the pointer attribute. diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 6ea6e136d4f..e96bcdb1b44 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -1031,7 +1031,7 @@ variable_check (gfc_expr *e, int n, bool allow_proc) break; } - if (!ref) + if (!ref && !(pointer && e->ref && e->ref->type == REF_ARRAY)) { gfc_error ("%qs argument of %qs intrinsic at %L cannot be " "INTENT(IN)", gfc_current_intrinsic_arg[n]->name, @@ -1062,7 +1062,8 @@ variable_check (gfc_expr *e, int n, bool allow_proc) return true; gfc_error ("%qs argument of %qs intrinsic at %L must be a variable", -gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic, >where); +gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic, +>where); return false; }
[Bug fortran/103418] random_number() does not accept pointer, intent(in) array argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103418 --- Comment #4 from anlauf at gcc dot gnu.org --- (In reply to kargl from comment #3) > (In reply to anlauf from comment #2) > > The nearly obvious fix: > > > > diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c > > index 837eb0912c0..3859e18c6c3 100644 > > --- a/gcc/fortran/check.c > > +++ b/gcc/fortran/check.c > > @@ -1031,7 +1031,7 @@ variable_check (gfc_expr *e, int n, bool allow_proc) > > break; > > } > > > > - if (!ref) > > + if (!ref && !pointer) > > { > > gfc_error ("%qs argument of %qs intrinsic at %L cannot be " > > "INTENT(IN)", gfc_current_intrinsic_arg[n]->name, > > > > regresses for gfortran.dg/move_alloc_8.f90, thus needs additional > > investigation. > > Did you try the patch posted in Fortran Discourse? No. I'm afraid I also missed it on the usual channels where patches for gcc are posted.
Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377
Le 25/11/2021 à 21:03, Harald Anlauf a écrit : Hi Mikael, Am 25.11.21 um 17:46 schrieb Mikael Morin: Hello, Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit : diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 5a5aca10ebe..837eb0912c0 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape, { gfc_constructor *c; bool test; + gfc_constructor_base b; + if (shape->expr_type == EXPR_ARRAY) + b = shape->value.constructor; + else if (shape->expr_type == EXPR_VARIABLE) + b = shape->symtree->n.sym->value->value.constructor; This misses a check that shape->symtree->n.sym->value is an array, so that it makes sense to access its constructor. there are checks further above for the cases shape->expr_type == EXPR_ARRAY and for shape->expr_type == EXPR_VARIABLE which look at the elements of array shape to see if they are non-negative. Only in those cases where the full "if ()'s" pass we set shape_is_const = true; and proceed. The purpose of the auxiliary bool shape_is_const is to avoid repeating the lengthy if's again. Only then the above cited code segment should get executed. For shape->expr_type == EXPR_ARRAY there is really no change in logic. For shape->expr_type == EXPR_VARIABLE the above snipped is now executed, but then we already had else if (shape->expr_type == EXPR_VARIABLE && shape->ref && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1 && shape->ref->u.ar.as && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER && shape->symtree->n.sym->attr.flavor == FL_PARAMETER && shape->symtree->n.sym->value) In which situations do I miss anything new? Yes, I agree with all of this. My comment wasn’t about a check on shape->expr_type, but on shape->value->expr_type if shape->expr_type is a (parameter) variable. Actually, this only supports the case where the parameter value is defined by an array; but it could be an intrinsic call, a sum of parameters, a reference to an other parameter, etc. E.g. the following (still) does get rejected: print *, reshape([1,2,3,4,5], a+1) print *, reshape([1,2,3,4,5], a+a) print *, reshape([1,2,3,4,5], 2*a) print *, reshape([1,2,3,4,5], [3,3]) print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2)) and has been rejected before. The usual way to handle this is to call gfc_reduce_init_expr which (pray for it) will make an array out of whatever the shape expression is. Can you give an example where it fails? I think the current code would almost certainly fail, too. Probably, I was just trying to avoid followup bugs. ;-) I have checked the following: integer, parameter :: a(2) = [1,1] integer, parameter :: b(2) = a + 1 print *, reshape([1,2,3,4], b) end and it doesn’t fail as I thought it would. So yes, I was wrong; b has been expanded to an array before. Can you add an assert or a comment saying that the parameter value has been expanded to a constant array? Ok with that change.
[Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393 --- Comment #13 from Richard Earnshaw --- Also, note that the comment in gimple-fold.c prior to this change read: /* If we can perform the copy efficiently with first doing all loads and then all stores inline it that way. Currently efficiently means that we can load all the memory into a single integer register which is what MOVE_MAX gives us. */ Which would imply that the AArch64 definition of MOVE_MAX is the correct one.
[Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393 --- Comment #12 from Richard Earnshaw --- (In reply to Jakub Jelinek from comment #10) > Alternatively, couldn't we check next to that new > && have_insn_for (SET, mode) > also that > && known_le (GET_MODE_SIZE (mode), MOVE_MAX) > ? No, that would limit us to MOVE_MAX again, so what would be the point in having a more relaxed test earlier. I do wonder if MOVE_MAX * MOVE_RATIO should be replaced with the MOVE_BY_PIECES infrastructure, I just haven't had time to cook up a patch to try that, though.
Re: [PATCH 4/4] libgcc: Use _dl_find_eh_frame in _Unwind_Find_FDE
* Jakub Jelinek: >> +/* Fallback declaration for old glibc headers. DL_FIND_EH_FRAME_DBASE is >> used >> + as a proxy to determine if declares _dl_find_eh_frame. */ >> +#if defined __GLIBC__ && !defined DL_FIND_EH_FRAME_DBASE >> +#if NEED_DBASE_MEMBER >> +void *_dl_find_eh_frame (void *__pc, void **__dbase) __attribute__ ((weak)); >> +#else >> +void *_dl_find_eh_frame (void *__pc) __attribute__ ((weak)); >> +#endif >> +#define USE_DL_FIND_EH_FRAME 1 >> +#define DL_FIND_EH_FRAME_CONDITION (_dl_find_eh_frame != NULL) >> +#endif > > I'd prefer not to do this. If we find glibc with the support in the > headers, let's use it, otherwise let's keep using what we were doing before. I've included a simplified version below, based on the _dl_find_object patch for glibc. This is a bit difficult to test, but I ran a full toolchain bootstrap with GCC + glibc on all glibc-supported architectures (except Hurd and one m68k variant; they do not presnetly build, see Joseph's testers). I also tested this by copying the respective GCC-built libgcc_s into a glibc build tree for run-time testing on i686-linux-gnu and x86_64-linux-gnu. There weren't any issues. There are a buch of unwinder tests in glibc, giving at least some coverage. Thanks, Florian Subject: libgcc: Use _dl_find_object in _Unwind_Find_FDE libgcc/ChangeLog: * unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Call _dl_find_object if available. --- libgcc/unwind-dw2-fde-dip.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c index fbb0fbdebb9..b837d8e4904 100644 --- a/libgcc/unwind-dw2-fde-dip.c +++ b/libgcc/unwind-dw2-fde-dip.c @@ -504,6 +504,24 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) if (ret != NULL) return ret; + /* Use DLFO_STRUCT_HAS_EH_DBASE as a proxy for the existence of a glibc-style + _dl_find_object function. */ +#ifdef DLFO_STRUCT_HAS_EH_DBASE + { +struct dl_find_object dlfo; +if (_dl_find_object (pc, ) == 0) + return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame, +# if DLFO_STRUCT_HAS_EH_DBASE + (_Unwind_Ptr) dlfo.dlfo_eh_dbase, +# else + NULL, +# endif + bases); +else + return NULL; +} +#endif /* DLFO_STRUCT_HAS_EH_DBASE */ + data.pc = (_Unwind_Ptr) pc; #if NEED_DBASE_MEMBER data.dbase = NULL;
[PATCH v2] elf: Add _dl_find_object function
I have reword the previous patch to make the interface more generally useful. Since there are now four words in the core arrays, I did away with the separate base address array. (We can bring it back in the future if necessary.) I fixed a bug in the handling of proxy map (by not copying proxy maps during the dlopen update). The placement of the function is also different, as explained in the commit message. The performance seems unchanged. I haven't included the obvious future performance enhancements in this patch, and also did not update to Arm's __gnu_Unwind_Find_exidx to use the new interface. I think this work can be done in follow-up patches. Thanks, Florian Subject: elf: Add _dl_find_object function It can be used to speed up the libgcc unwinder, and the internal _dl_find_dso_for_object function (which is used for caller identification in dlopen and related functions, and in dladdr). _dl_find_object is in the internal namespace due to bug 28503. If libgcc switches to _dl_find_object, this namespace issue will be fixed. It is located in libc for two reasons: it is necessary to forward the call to the static libc after static dlopen, and there is a link ordering issue with -static-libgcc and libgcc_eh.a because libc.so is not a linker script that includes ld.so in the glibc build tree (so that GCC's internal -lc after libgcc_eh.a does not pick up ld.so). It is necessary to do the i386 customization in the sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because otherwise, multilib installations are broken. The implementation uses software transactional memory, as suggested by Torvald Riegel. Two copies of the supporting data structures are used, also achieving full async-signal-safety. --- NEWS | 4 + bits/dl_find_object.h | 32 + dlfcn/Makefile | 2 +- dlfcn/dlfcn.h | 22 + elf/Makefile | 47 +- elf/Versions | 3 + elf/dl-close.c | 4 + elf/dl-find_object.c | 841 + elf/dl-find_object.h | 115 +++ elf/dl-libc_freeres.c | 2 + elf/dl-open.c | 5 + elf/dl-support.c | 3 + elf/libc-dl_find_object.c | 26 + elf/rtld.c | 11 + elf/rtld_static_init.c | 1 + elf/tst-dl_find_object-mod1.c | 10 + elf/tst-dl_find_object-mod2.c | 15 + elf/tst-dl_find_object-mod3.c | 10 + elf/tst-dl_find_object-mod4.c | 10 + elf/tst-dl_find_object-mod5.c | 11 + elf/tst-dl_find_object-mod6.c | 11 + elf/tst-dl_find_object-mod7.c | 10 + elf/tst-dl_find_object-mod8.c | 10 + elf/tst-dl_find_object-mod9.c | 10 + elf/tst-dl_find_object-static.c| 22 + elf/tst-dl_find_object-threads.c | 275 +++ elf/tst-dl_find_object.c | 240 ++ include/atomic_wide_counter.h | 14 + include/bits/dl_find_object.h | 1 + include/dlfcn.h| 2 + include/link.h | 3 + manual/Makefile| 2 +- manual/dynlink.texi| 137 manual/libdl.texi | 10 - manual/probes.texi | 2 +- manual/threads.texi| 2 +- sysdeps/arm/bits/dl_find_object.h | 25 + sysdeps/generic/ldsodefs.h | 5 + sysdeps/mach/hurd/i386/libc.abilist| 1 + sysdeps/nios2/bits/dl_find_object.h| 25 + sysdeps/unix/sysv/linux/aarch64/libc.abilist | 1 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 1 + sysdeps/unix/sysv/linux/arc/libc.abilist | 1 + sysdeps/unix/sysv/linux/arm/be/libc.abilist| 1 + sysdeps/unix/sysv/linux/arm/le/libc.abilist| 1 + sysdeps/unix/sysv/linux/csky/libc.abilist | 1 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 1 + sysdeps/unix/sysv/linux/i386/libc.abilist | 1 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 1 + sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist | 1 + sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist | 1 + sysdeps/unix/sysv/linux/microblaze/be/libc.abilist | 1 + sysdeps/unix/sysv/linux/microblaze/le/libc.abilist | 1 +
[Bug middle-end/103406] gcc -O0 behaves differently on "DBL_MAX related operations" than gcc -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103406 --- Comment #14 from joseph at codesourcery dot com --- There is no reasonable definition of how operands of binary + map to particular operands of a particular instruction and so no -f or -m option could sensibly be defined for that. When the result is a NaN, there is no requirement at all on what (quiet) NaN it is (beyond a preference for preservation of the payload of a NaN operand if there is at least one NaN operand).
[Bug tree-optimization/99520] Failure to detect bswap pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99520 Roger Sayle changed: What|Removed |Added Target Milestone|--- |12.0 CC||roger at nextmovesoftware dot com Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Roger Sayle --- This PR is now fixed on mainline. Thanks to Jakub (my apologies if I'd seen comment #2 I wouldn't of accidentally broken things; aka PR tree-optimization/103376, fortunately Jakub was able to quickly correct my oversight).
[Bug tree-optimization/98953] Failure to optimize two reads from adjacent addresses into one
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98953 Roger Sayle changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|roger at nextmovesoftware dot com |unassigned at gcc dot gnu.org --- Comment #4 from Roger Sayle --- The MULT_EXPR and PLUS_EXPR aspects of this PR are now resolved (i.e. the case in comment #1), but unfortunately the abs-based indexing used in the original report still causes problems. The bswap pass doesn't yet handle memory accesses of the form read[abs]/read[abs+1] (but does handle read[0]/read[1]).
[committed] libstdc++: Do not use memset in constexpr calls to ranges::fill_n [PR101608]
Tested x86_64-linux, pushed to trunk. libstdc++-v3/ChangeLog: PR libstdc++/101608 * include/bits/ranges_algobase.h (__fill_n_fn): Check for constant evaluation before using memset. * testsuite/25_algorithms/fill_n/constrained.cc: Check byte-sized values as well. --- libstdc++-v3/include/bits/ranges_algobase.h | 28 --- .../25_algorithms/fill_n/constrained.cc | 6 ++-- 2 files changed, 22 insertions(+), 12 deletions(-) diff --git a/libstdc++-v3/include/bits/ranges_algobase.h b/libstdc++-v3/include/bits/ranges_algobase.h index c8c4d032983..9929e5e828b 100644 --- a/libstdc++-v3/include/bits/ranges_algobase.h +++ b/libstdc++-v3/include/bits/ranges_algobase.h @@ -527,17 +527,25 @@ namespace ranges if (__n <= 0) return __first; - // TODO: Generalize this optimization to contiguous iterators. - if constexpr (is_pointer_v<_Out> - // Note that __is_byte already implies !is_volatile. - && __is_byte>::__value - && integral<_Tp>) - { - __builtin_memset(__first, static_cast(__value), __n); - return __first + __n; - } - else if constexpr (is_scalar_v<_Tp>) + if constexpr (is_scalar_v<_Tp>) { + // TODO: Generalize this optimization to contiguous iterators. + if constexpr (is_pointer_v<_Out> + // Note that __is_byte already implies !is_volatile. + && __is_byte>::__value + && integral<_Tp>) + { +#ifdef __cpp_lib_is_constant_evaluated + if (!std::is_constant_evaluated()) +#endif + { + __builtin_memset(__first, +static_cast(__value), +__n); + return __first + __n; + } + } + const auto __tmp = __value; for (; __n > 0; --__n, (void)++__first) *__first = __tmp; diff --git a/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc b/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc index 6a015d34a89..1d1e1c104d4 100644 --- a/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc +++ b/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc @@ -73,11 +73,12 @@ test01() } } +template constexpr bool test02() { bool ok = true; - int x[6] = { 1, 2, 3, 4, 5, 6 }; + T x[6] = { 1, 2, 3, 4, 5, 6 }; const int y[6] = { 1, 2, 3, 4, 5, 6 }; const int z[6] = { 17, 17, 17, 4, 5, 6 }; @@ -94,5 +95,6 @@ int main() { test01(); - static_assert(test02()); + static_assert(test02()); + static_assert(test02()); // PR libstdc++/101608 } -- 2.31.1
[Bug libstdc++/101608] ranges::fill/fill_n missing std::is_constant_evaluated() condition for __builtin_memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101608 --- Comment #2 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:82c3657dd74896b39937bb0a2aaeba9b8ca105fd commit r12-5530-g82c3657dd74896b39937bb0a2aaeba9b8ca105fd Author: Jonathan Wakely Date: Wed Nov 24 13:17:54 2021 + libstdc++: Do not use memset in constexpr calls to ranges::fill_n [PR101608] libstdc++-v3/ChangeLog: PR libstdc++/101608 * include/bits/ranges_algobase.h (__fill_n_fn): Check for constant evaluation before using memset. * testsuite/25_algorithms/fill_n/constrained.cc: Check byte-sized values as well.
Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377
Hi Mikael, Am 25.11.21 um 17:46 schrieb Mikael Morin: Hello, Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit : diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 5a5aca10ebe..837eb0912c0 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape, { gfc_constructor *c; bool test; + gfc_constructor_base b; + if (shape->expr_type == EXPR_ARRAY) + b = shape->value.constructor; + else if (shape->expr_type == EXPR_VARIABLE) + b = shape->symtree->n.sym->value->value.constructor; This misses a check that shape->symtree->n.sym->value is an array, so that it makes sense to access its constructor. there are checks further above for the cases shape->expr_type == EXPR_ARRAY and for shape->expr_type == EXPR_VARIABLE which look at the elements of array shape to see if they are non-negative. Only in those cases where the full "if ()'s" pass we set shape_is_const = true; and proceed. The purpose of the auxiliary bool shape_is_const is to avoid repeating the lengthy if's again. Only then the above cited code segment should get executed. For shape->expr_type == EXPR_ARRAY there is really no change in logic. For shape->expr_type == EXPR_VARIABLE the above snipped is now executed, but then we already had else if (shape->expr_type == EXPR_VARIABLE && shape->ref && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1 && shape->ref->u.ar.as && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER && shape->symtree->n.sym->attr.flavor == FL_PARAMETER && shape->symtree->n.sym->value) In which situations do I miss anything new? Actually, this only supports the case where the parameter value is defined by an array; but it could be an intrinsic call, a sum of parameters, a reference to an other parameter, etc. E.g. the following (still) does get rejected: print *, reshape([1,2,3,4,5], a+1) print *, reshape([1,2,3,4,5], a+a) print *, reshape([1,2,3,4,5], 2*a) print *, reshape([1,2,3,4,5], [3,3]) print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2)) and has been rejected before. The usual way to handle this is to call gfc_reduce_init_expr which (pray for it) will make an array out of whatever the shape expression is. Can you give an example where it fails? I think the current code would almost certainly fail, too. The rest looks good. In the test, can you add a comment telling what it is testing? Something like: "This tests that constant shape expressions passed to the reshape intrinsic are properly simplified before being used to diagnose invalid values" Can do. We also used to put a comment mentioning the person who submitted the test, but not everybody seems to do it these days. Can do. Mikael Harald
[Bug tree-optimization/103345] missed optimization: add/xor individual bytes to form a word
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345 Roger Sayle changed: What|Removed |Added Target Milestone|--- |12.0 Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Roger Sayle --- This PR should now be fixed (missed optimization implemented) on mainline.
[Bug middle-end/103406] gcc -O0 behaves differently on "DBL_MAX related operations" than gcc -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103406 Roger Sayle changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|roger at nextmovesoftware dot com |unassigned at gcc dot gnu.org Summary|[12 Regression] gcc -O0 |gcc -O0 behaves differently |behaves differently on |on "DBL_MAX related |"DBL_MAX related|operations" than gcc -O1 |operations" than gcc -O1|and above |and above | Target||x86_64 --- Comment #13 from Roger Sayle --- The Inf - Inf => 0.0 regression should now be fixed on mainline. Hmm. As hinted by Richard Beiner's investigation, the underlying problem is even more pervasive. It turns out that on x86/IA64 chips, floating point addition is not commutative, i.e. x+y is not the same as y+x, as demonstrated by the test program below: #include const double pn = __builtin_nan(""); const double mn = -__builtin_nan(""); __attribute__ ((noinline, noclone)) double plus(double x, double y) { return x + y; } int main() { printf("%lf\n",plus(pn,mn)); printf("%lf\n",plus(mn,pn)); return 0; } Output: nan -nan Unfortunately, GCC assumes almost everywhere the FP addition is commutative and (as per comments #8 and #9) associative with negation/minus. This appears to be target property, c.f. libgcc's _FP_CHOOSENAN, but could in theory be resolved by a -fstrict-math mode (that implies -ftrapping-math) that disables commutativity (swapping of operands) throughout the compiler, including reload/fold-const etc., on affected Intel-like targets. Perhaps this PR is a duplicate now that the regression has been fixed?
[Bug c++/56119] Allows static member definition of template class in namespace not enclosing this class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56119 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=103426 CC||fchelnokov at gmail dot com --- Comment #3 from Andrew Pinski --- *** Bug 103426 has been marked as a duplicate of this bug. ***
[Bug c++/103426] Acceptance of invalid template specialization in a namespace not enclosing the specialized template
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103426 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=56119 Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Andrew Pinski --- This is a dup of bug 56119. *** This bug has been marked as a duplicate of bug 56119 ***
[Bug tree-optimization/103427] Alignment of C++ references and 'this' pointer not used by optimizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103427 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-11-25 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Severity|normal |enhancement --- Comment #11 from Andrew Pinski --- Confirmed. I had thought there was another bug about this but I can't find it.
[Bug tree-optimization/103332] Spurious -Wstringop-overflow warnings in libstdc++ tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103332 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-11-25 Status|UNCONFIRMED |NEW --- Comment #4 from Andrew Pinski --- .
[Bug target/102117] s390: Inefficient code for 64x64=128 signed multiply for <= z13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102117 Roger Sayle changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Target Milestone|--- |12.0 --- Comment #4 from Roger Sayle --- This should now be fixed on mainline.
[Bug tree-optimization/102958] std::u8string suboptimal compared to std::string, triggers warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102958 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-11-25 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #3 from Andrew Pinski --- Confirmed, interesting we don't detect this as strlen: [local count: 8687547547]: # __i_155 = PHI <__i_46(3), 0(2)> __i_46 = __i_155 + 1; _48 = MEM[(const char_type &)"123456789" + __i_46 * 1]; if (_48 != 0) goto ; [89.00%] else goto ; [11.00%] I thought there was code to do that dection now?
[Bug middle-end/103406] [12 Regression] gcc -O0 behaves differently on "DBL_MAX related operations" than gcc -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103406 --- Comment #12 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:6ea5fb3cc7f3cc9b731d72183c66c23543876f5a commit r12-5529-g6ea5fb3cc7f3cc9b731d72183c66c23543876f5a Author: Roger Sayle Date: Thu Nov 25 19:02:06 2021 + PR middle-end/103406: Check for Inf before simplifying x-x. This is a simple one line fix to the regression PR middle-end/103406, where x - x is being folded to 0.0 even when x is +Inf or -Inf. In GCC 11 and previously, we'd check whether the type honored NaNs (which implicitly covered the case where the type honors infinities), but my patch to test whether the operand could potentially be NaN failed to also check whether the operand could potentially be Inf. 2021-11-25 Roger Sayle gcc/ChangeLog PR middle-end/103406 * match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p. gcc/testsuite/ChangeLog PR middle-end/103406 * gcc.dg/pr103406.c: New test case.
[Bug tree-optimization/103429] Optimization of Auto-generated condition chain is not giving good lookup tables.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103429 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug c++/102454] coroutines: ICE in gimplify_var_or_parm_decl, at gimplify.c:2958
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102454 --- Comment #7 from Iain Sandoe --- I was leaving it to check if we needed to back port to 10.x as well.
[Bug c++/102213] Incorrect executable produced from valid input code with virtual consteval
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102213 --- Comment #2 from Andrew Pinski --- Note GCC 10 did a sorry message: sorry, unimplemented: 'virtual' 'consteval'