[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302

2019-04-25 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Jeffrey A. Law  ---
Fixed on trunk.

[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302

2019-04-25 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037

--- Comment #14 from Jeffrey A. Law  ---
Author: law
Date: Thu Apr 25 14:32:16 2019
New Revision: 270574

URL: https://gcc.gnu.org/viewcvs?rev=270574&root=gcc&view=rev
Log:
PR tree-optimization/90037
* Makefile.in (OBJS): Remove tree-ssa-phionlycprop.c
* passes.def: Replace all instance of phi-only cprop with the
lattice propagator.  Move propagation pass from after erroneous
path isolation to before erroneous path isolation.
* tree-ssa-phionlycprop.c: Remove.

* gcc.dg/tree-ssa/20030710-1.c: Update dump file to scan.
* gcc.dg/isolate-2.c: Likewise.
* gcc.dg/isolate-4.c: Likewise.
* gcc.dg/pr19431.c: Accept either ordering of PHI args.
* gcc.dg/pr90037.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/pr90037.c
Removed:
trunk/gcc/tree-ssa-phionlycprop.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/passes.def
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr19431.c

[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302

2019-04-18 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037

--- Comment #13 from Jeffrey A. Law  ---
So results from playing with do_rpo_vn.

Running do_rpo_vn on the full function is slower than lattice cprop.  It's on
the order of a percent or two.  I mostly did this because it was trivial to
code up, verify it fixed the regression and see if performance was even in the
right ballpark.

Then I did a version which only called do_rpo_vn on blocks with single
predecessors that had PHI nodes.  ie, the vast majority of degenerate PHI
cases.  That still fixed the regression (as expected).  Performance was ever so
slightly worse than doing a full function lattice cprop.  Overall it's on the
order of .03%.  THe resulting code is effectively the same as doing lattice
cprop.

So my current thought is to drop all the phi-only cprop code and use the
lattice cprop instead, adding a single call to the pass between DOM and path
isolation.  The compile-time hit is going to be small, on the order of .4%.  I
could be convinced to use rpo_vn as well -- it's ultimately a wash in terms of
compile-time and resulting code.

[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302

2019-04-18 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037

--- Comment #12 from Jeffrey A. Law  ---
So another update and a rather surprising one at that.

One of the things that is clear is that we need to do some kind of cleanup
between DOM and erroneous path isolation.  Furthermore, the existing cleanup
done by phi-only-cprop is insufficient.

So I finished cobbling together the necessary extensions to phi-only cprop. 
Essentially if we have one or more degenerate PHIs in a block, we also scan the
block for const/copy propagation opportunities.

I then compared that to just using the lattice cprop.

It turns out that the lattice based cprop is consistently better.  Using
valgrind/cachegrind across a collection of .i files the lattice cprop version
executes about .5% fewer instructions.  It was better on each and every input
source file.

Of course to fix the BZ we're going to have to add an instance of the pass.  So
I compared a pristine compiler to one with an extra pass of the lattice copy
propagator inserted between DOM and erroneous path isolation.  The total
slowdown is in the .4% range.

>From a codegen standpoint, it looks like a wash, which was largely expected. 
We tend to clean things up earlier in the pipeline, but the net result is
almost always the same.

I still need to look at the VN on the SEME approach.  That's today's task.

[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302

2019-04-17 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037

--- Comment #11 from Jeffrey A. Law  ---
That may be an interesting approach.  I think we'd want the new blocks created
by threading as well as the original blocks we threaded through since their
in-degree  gets reduced which in turn can expose new cprop opportunities.

We'd have to somehow be able to mark or otherwise remember the blocks in
question across a call to cfgcleanup so we'd know what SEME regions to
optimize.  But I guess missing one or more regions isn't fatal, it's "just" a
missed optimization.

[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302

2019-04-17 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037

--- Comment #10 from Jonathan Wakely  ---
Jeff posted this to PR 89819 instead of here:

Somewhat trimmed down testcase...  Certainly easier to analyze...

typedef __SIZE_TYPE__ size_t;
typedef unsigned long int uintmax_t;

struct group
{
  char *gr_name;
  char *gr_passwd;
  unsigned gr_gid;
  char **gr_mem;
};

struct passwd
{
  char *pw_name;
  char *pw_passwd;

  unsigned pw_uid;
  unsigned pw_gid;
  char *pw_gecos;
  char *pw_dir;
  char *pw_shell;
};

extern struct group *getgrnam (const char *);
extern struct group *getgrgid (unsigned);
extern void endgrent (void);
extern struct passwd *getpwnam (const char *);
extern void endpwent (void);
extern unsigned long int strtoul (const char *__restrict,
  char **__restrict, int);

char const *
parse_with_separator (char const *spec, char const *separator,
  unsigned *uid, unsigned *gid,
  char **username, char **groupname)
{
  static const char *E_bad_spec = "invalid spec";
  const char *error_msg;
  char *u;
  char const *g;
  struct group *grp;
  unsigned unum = *uid;

  error_msg = 0;

  u = 0;
  if (separator == 0)
u = __builtin_strdup (spec);
  size_t ulen = separator - spec;
  u = __builtin_malloc (ulen + 1);

  g = (separator == 0 || *(separator + 1) == '\0' ? 0 : separator + 1);

  if (u != 0)
{
  _Bool use_login_group = (separator != 0 && g == 0);
  if (use_login_group)
error_msg = E_bad_spec;

  endpwent ();
}

  if (g != 0 && error_msg == 0)
grp = (*g == '+' ? 0 : getgrnam (g));

  if (error_msg == 0)
*uid = unum;

  return 0;
}

[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302

2019-04-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037

--- Comment #9 from Richard Biener  ---
(In reply to Jeffrey A. Law from comment #8)
> So, if we change phionlycprop to look for other const/copy initializations
> that can be eliminated and run a pass between DOM and the erroneous-path
> isolation pass, then the false positive is eliminated (as expected).
> 
> There's two things I don't like about that.  First, it turns phionlycprop
> into a full-fledged constant propagation pass.  phionlycprop is supposed to
> be so fast that we never really notice it.  It accomplishes this by only
> looking at PHI nodes that are degenerates and any constants exposed by
> propagating way the degenerate PHI.  Essentially it's just cleaning up
> painfully obvious cruft left by jump threading.
> 
> To pick up this case we'd have to scan statements in blocks.  We could
> restrict that to blocks where we eliminated  degenerate PHI.  But still. 
> Ugh.
> 
> Second, once phionlycprop is doing more work, I'm less inclined to want to
> add another instance of the pass.
> 
> Finally, once phionlycprop is doing more work one could legitimately ask if
> we should just drop the code and use the lattice copy propagator.
> 
> Just for fun I replaced all the phi-only cprop calls with calls into the
> lattice propagator (including the one I added between DOM and erroneous-path
> optimization).  As expected that fixes the testcase too.  It also happens to
> clean up things slightly better at an earlier point in the optimizer
> pipeline.  I don't know if it's a good trade-off though.

As a middle-ground you can now run non-iterating value-numbering on a
SEME region.  We are already doing that for unrolled loop bodies,
if-converted loop bodies and loop header copies, exactly to (mostly) get
local constant propagation & simplifications done.  IMHO the copied
paths jump threading creates are a perfect candidate for this treatment
as well.  See for example tree-ssa-loop-ch.c where it calls do_rpo_vn
(obviously VN needs up-to-date SSA so the VN is delayed until after all
loop-header copying is done and we remember the SEME regions to VN).

[Bug tree-optimization/90037] [9 Regression] -Wnull-dereference false positive after r269302

2019-04-16 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
  Component|middle-end  |tree-optimization

--- Comment #8 from Jeffrey A. Law  ---
So, if we change phionlycprop to look for other const/copy initializations that
can be eliminated and run a pass between DOM and the erroneous-path isolation
pass, then the false positive is eliminated (as expected).

There's two things I don't like about that.  First, it turns phionlycprop into
a full-fledged constant propagation pass.  phionlycprop is supposed to be so
fast that we never really notice it.  It accomplishes this by only looking at
PHI nodes that are degenerates and any constants exposed by propagating way the
degenerate PHI.  Essentially it's just cleaning up painfully obvious cruft left
by jump threading.

To pick up this case we'd have to scan statements in blocks.  We could restrict
that to blocks where we eliminated  degenerate PHI.  But still.  Ugh.

Second, once phionlycprop is doing more work, I'm less inclined to want to add
another instance of the pass.

Finally, once phionlycprop is doing more work one could legitimately ask if we
should just drop the code and use the lattice copy propagator.

Just for fun I replaced all the phi-only cprop calls with calls into the
lattice propagator (including the one I added between DOM and erroneous-path
optimization).  As expected that fixes the testcase too.  It also happens to
clean up things slightly better at an earlier point in the optimizer pipeline. 
I don't know if it's a good trade-off though.