On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> [One more time, but without rejected HTML mail, because apparently this
> is my first post to gcc-patches *ever* ;-)].
> 
> Howdy!
> 
> While poking around in the backwards threader I noticed that we bail if
> we have already seen a starting BB.
> 
>       /* Do not jump-thread twice from the same block.  */
>       if (bitmap_bit_p (threaded_blocks, entry->src->index)
> 
> This limitation discards paths that are sub-paths of paths that have
> already been threaded.
> 
> The following patch scans the remaining to-be-threaded paths to identify
> if any of them start from the same point, and are thus sub-paths of the
> just-threaded path.  By removing the common prefix of blocks in upcoming
> threadable paths, and then rewiring first non-common block
> appropriately, we expose new threading opportunities, since we are no
> longer starting from the same BB.  We also simplify the would-be
> threaded paths, because we don't duplicate already duplicated paths.
> 
> This sounds confusing, but the documentation for the entry point to my
> patch (adjust_paths_after_duplication) shows an actual example:
> 
> +/* After an FSM path has been jump threaded, adjust the remaining FSM
> +   paths that are subsets of this path, so these paths can be safely
> +   threaded within the context of the new threaded path.
> +
> +   For example, suppose we have just threaded:
> +
> +   5 -> 6 -> 7 -> 8 -> 12      =>      5 -> 6' -> 7' -> 8' -> 12'
> +
> +   And we have an upcoming threading candidate:
> +   5 -> 6 -> 7 -> 8 -> 15 -> 20
> +
> +   This function adjusts the upcoming path into:
> +   8' -> 15 -> 20
So a nit on the comment.  The destination of the target edge in a jump
threading path is not duplicated.  So for the comment to be correct, I
think you need to change 12' to just 12.

Presumably BB8 ends in a conditional which we can't determine anything
about statically (otherwise we wouldn't have 8->12 and 8->15).  It's
something that may confuse someone not familiar with this code.  Though
full CFGs may be too painful to show here as well.




> 
> Notice that we will no longer have two paths that start from the same
> BB.  One will start with bb5, while the adjusted path will start with
> bb8'.  With this we kill two birds-- we are able to thread more paths,
> and these paths will avoid duplicating a whole mess of things that have
> already been threaded.
Which is a good thing IMHO.  Essentially with the code right now we have
to wait until the next backwards threading pass to pick up the second
jump thread.   It's not really a secondary effect -- we found the
thread, but didn't have the support we needed in the copier to DTRT.


> 
> The attached patch is a subset of some upcoming work that can live on
> its own.  It bootstraps and regtests.  Also, by running it on a handful
> of .ii files, I can see that we are able to thread sub-paths that we
> previously dropped on the floor.  More is better, right? :)
Generally, yes.  Similarly, the more thoroughly we can thread earlier,
the better.  While my immediate goal is to remove tree-vrp's jump
threading, dropping one or two (of the four!) backwards passes is also
on the hitlist.


> 
> To test this, I stole Jeff's method of using cachegrind to benchmark
> instruction counts and conditional branches
> (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).
:-)

> 
> Basically, I bootstrapped two compilers, with and without improvements,
> and used each to build a stage1 trunk.  Each of these stage1-trunks were
> used on 246 .ii GCC files I have lying around from a bootstrap some
> random point last year.  I used no special flags on builds apart from
> --enable-languages=c,c++.
> 
> Although I would've wished a larger improvement, this works comes for
> free, as it's just a subset of other work I'm hacking on.
Yup.

> 
> Without further ado, here are my monumental, earth shattering improvements:
> 
> Conditional branches
>    Without patch: 411846839709
>    With    patch: 411831330316
>         %changed: -0.0037660%
> 
> Number of instructions
>    Without patch: 2271844650634
>    With    patch: 2271761153578
>         %changed: -0.0036754%
So that's pretty small.  A really good improvement would be on the order
of a half-percent reduction in runtime conditional branches.  I usually
test with .i files that have enable-checking turned on -- which tends to
give lots of opportunities due to the redundancies in our checking code.

On a positive note, you're eliminating roughly 4.5 other dynamic
instructions for every runtime conditional branch you remove, which is
actually a good ratio.  3.5 is what I typically see for a fairly
extensive amount of work.   Patrick P's work last year was on the order
of 7.5.  So while it didn't fire often, when it did it was highly effective.

We have installed changes of this nature when they were part of a larger
set of changes, particularly if they helped us elsewhere (like
eliminating key path to silence a bogus warning or addressing other
regressions).

I'm going to defer judgment right now.  I will do some further testing
as I walk through the various uninitialized and similar warnings as well
as with any work to see if we can reasonably eliminate one or two of the
backwards passes.  If it helps in either of those contexts then we have
a much stronger case to include now rather than defer until the larger
work lands.



> p.s. There is a lot of debugging/dumping code in my patch, which I can
> gladly remove if/when approved.  It helps keep my head straight while
> looking at this spaghetti :).
Yea :-)  Actually keeping some of it is probably useful.  My brain turns
to mush as I move between the old style and new style jump threading
implementations.  Better dumps would probably help.


SO the changes to test that path->length > 1 in mark_threaded_blocks
seems like it ought to go forward independently.

Conceptually it looks sane.  A testcase or two would help.


Jeff



Reply via email to