[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #15 from Jeffrey A. Law --- Fixed on trunk.
[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037 --- Comment #14 from Jeffrey A. Law --- Author: law Date: Thu Apr 25 14:32:16 2019 New Revision: 270574 URL: https://gcc.gnu.org/viewcvs?rev=270574&root=gcc&view=rev Log: PR tree-optimization/90037 * Makefile.in (OBJS): Remove tree-ssa-phionlycprop.c * passes.def: Replace all instance of phi-only cprop with the lattice propagator. Move propagation pass from after erroneous path isolation to before erroneous path isolation. * tree-ssa-phionlycprop.c: Remove. * gcc.dg/tree-ssa/20030710-1.c: Update dump file to scan. * gcc.dg/isolate-2.c: Likewise. * gcc.dg/isolate-4.c: Likewise. * gcc.dg/pr19431.c: Accept either ordering of PHI args. * gcc.dg/pr90037.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr90037.c Removed: trunk/gcc/tree-ssa-phionlycprop.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/passes.def trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr19431.c
[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037 --- Comment #13 from Jeffrey A. Law --- So results from playing with do_rpo_vn. Running do_rpo_vn on the full function is slower than lattice cprop. It's on the order of a percent or two. I mostly did this because it was trivial to code up, verify it fixed the regression and see if performance was even in the right ballpark. Then I did a version which only called do_rpo_vn on blocks with single predecessors that had PHI nodes. ie, the vast majority of degenerate PHI cases. That still fixed the regression (as expected). Performance was ever so slightly worse than doing a full function lattice cprop. Overall it's on the order of .03%. THe resulting code is effectively the same as doing lattice cprop. So my current thought is to drop all the phi-only cprop code and use the lattice cprop instead, adding a single call to the pass between DOM and path isolation. The compile-time hit is going to be small, on the order of .4%. I could be convinced to use rpo_vn as well -- it's ultimately a wash in terms of compile-time and resulting code.
[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037 --- Comment #12 from Jeffrey A. Law --- So another update and a rather surprising one at that. One of the things that is clear is that we need to do some kind of cleanup between DOM and erroneous path isolation. Furthermore, the existing cleanup done by phi-only-cprop is insufficient. So I finished cobbling together the necessary extensions to phi-only cprop. Essentially if we have one or more degenerate PHIs in a block, we also scan the block for const/copy propagation opportunities. I then compared that to just using the lattice cprop. It turns out that the lattice based cprop is consistently better. Using valgrind/cachegrind across a collection of .i files the lattice cprop version executes about .5% fewer instructions. It was better on each and every input source file. Of course to fix the BZ we're going to have to add an instance of the pass. So I compared a pristine compiler to one with an extra pass of the lattice copy propagator inserted between DOM and erroneous path isolation. The total slowdown is in the .4% range. >From a codegen standpoint, it looks like a wash, which was largely expected. We tend to clean things up earlier in the pipeline, but the net result is almost always the same. I still need to look at the VN on the SEME approach. That's today's task.
[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037 --- Comment #11 from Jeffrey A. Law --- That may be an interesting approach. I think we'd want the new blocks created by threading as well as the original blocks we threaded through since their in-degree gets reduced which in turn can expose new cprop opportunities. We'd have to somehow be able to mark or otherwise remember the blocks in question across a call to cfgcleanup so we'd know what SEME regions to optimize. But I guess missing one or more regions isn't fatal, it's "just" a missed optimization.
[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037 --- Comment #10 from Jonathan Wakely --- Jeff posted this to PR 89819 instead of here: Somewhat trimmed down testcase... Certainly easier to analyze... typedef __SIZE_TYPE__ size_t; typedef unsigned long int uintmax_t; struct group { char *gr_name; char *gr_passwd; unsigned gr_gid; char **gr_mem; }; struct passwd { char *pw_name; char *pw_passwd; unsigned pw_uid; unsigned pw_gid; char *pw_gecos; char *pw_dir; char *pw_shell; }; extern struct group *getgrnam (const char *); extern struct group *getgrgid (unsigned); extern void endgrent (void); extern struct passwd *getpwnam (const char *); extern void endpwent (void); extern unsigned long int strtoul (const char *__restrict, char **__restrict, int); char const * parse_with_separator (char const *spec, char const *separator, unsigned *uid, unsigned *gid, char **username, char **groupname) { static const char *E_bad_spec = "invalid spec"; const char *error_msg; char *u; char const *g; struct group *grp; unsigned unum = *uid; error_msg = 0; u = 0; if (separator == 0) u = __builtin_strdup (spec); size_t ulen = separator - spec; u = __builtin_malloc (ulen + 1); g = (separator == 0 || *(separator + 1) == '\0' ? 0 : separator + 1); if (u != 0) { _Bool use_login_group = (separator != 0 && g == 0); if (use_login_group) error_msg = E_bad_spec; endpwent (); } if (g != 0 && error_msg == 0) grp = (*g == '+' ? 0 : getgrnam (g)); if (error_msg == 0) *uid = unum; return 0; }
[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037 --- Comment #9 from Richard Biener --- (In reply to Jeffrey A. Law from comment #8) > So, if we change phionlycprop to look for other const/copy initializations > that can be eliminated and run a pass between DOM and the erroneous-path > isolation pass, then the false positive is eliminated (as expected). > > There's two things I don't like about that. First, it turns phionlycprop > into a full-fledged constant propagation pass. phionlycprop is supposed to > be so fast that we never really notice it. It accomplishes this by only > looking at PHI nodes that are degenerates and any constants exposed by > propagating way the degenerate PHI. Essentially it's just cleaning up > painfully obvious cruft left by jump threading. > > To pick up this case we'd have to scan statements in blocks. We could > restrict that to blocks where we eliminated degenerate PHI. But still. > Ugh. > > Second, once phionlycprop is doing more work, I'm less inclined to want to > add another instance of the pass. > > Finally, once phionlycprop is doing more work one could legitimately ask if > we should just drop the code and use the lattice copy propagator. > > Just for fun I replaced all the phi-only cprop calls with calls into the > lattice propagator (including the one I added between DOM and erroneous-path > optimization). As expected that fixes the testcase too. It also happens to > clean up things slightly better at an earlier point in the optimizer > pipeline. I don't know if it's a good trade-off though. As a middle-ground you can now run non-iterating value-numbering on a SEME region. We are already doing that for unrolled loop bodies, if-converted loop bodies and loop header copies, exactly to (mostly) get local constant propagation & simplifications done. IMHO the copied paths jump threading creates are a perfect candidate for this treatment as well. See for example tree-ssa-loop-ch.c where it calls do_rpo_vn (obviously VN needs up-to-date SSA so the VN is delayed until after all loop-header copying is done and we remember the SEME regions to VN).
[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P2 Component|middle-end |tree-optimization --- Comment #8 from Jeffrey A. Law --- So, if we change phionlycprop to look for other const/copy initializations that can be eliminated and run a pass between DOM and the erroneous-path isolation pass, then the false positive is eliminated (as expected). There's two things I don't like about that. First, it turns phionlycprop into a full-fledged constant propagation pass. phionlycprop is supposed to be so fast that we never really notice it. It accomplishes this by only looking at PHI nodes that are degenerates and any constants exposed by propagating way the degenerate PHI. Essentially it's just cleaning up painfully obvious cruft left by jump threading. To pick up this case we'd have to scan statements in blocks. We could restrict that to blocks where we eliminated degenerate PHI. But still. Ugh. Second, once phionlycprop is doing more work, I'm less inclined to want to add another instance of the pass. Finally, once phionlycprop is doing more work one could legitimately ask if we should just drop the code and use the lattice copy propagator. Just for fun I replaced all the phi-only cprop calls with calls into the lattice propagator (including the one I added between DOM and erroneous-path optimization). As expected that fixes the testcase too. It also happens to clean up things slightly better at an earlier point in the optimizer pipeline. I don't know if it's a good trade-off though.