On Mon, Jul 9, 2018 at 9:19 AM Aldy Hernandez <al...@redhat.com> wrote:
>
>
>
> On 07/05/2018 03:27 PM, Jeff Law wrote:
> > On 07/04/2018 02:12 AM, Aldy Hernandez wrote:
> >>
> >>
> >> On 07/03/2018 08:16 PM, Jeff Law wrote:
> >>> On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
> >>>> On 07/02/2018 07:08 AM, Christophe Lyon wrote:
> >>>>
> >>>>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> >>>>>>>> While poking around in the backwards threader I noticed that we bail 
> >>>>>>>> if
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> we have already seen a starting BB.
> >>>>>>>>
> >>>>>>>>           /* Do not jump-thread twice from the same block.  */
> >>>>>>>>           if (bitmap_bit_p (threaded_blocks, entry->src->index)
> >>>>>>>>
> >>>>>>>> This limitation discards paths that are sub-paths of paths that have
> >>>>>>>> already been threaded.
> >>>>>>>>
> >>>>>>>> The following patch scans the remaining to-be-threaded paths to 
> >>>>>>>> identify
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> if any of them start from the same point, and are thus sub-paths of 
> >>>>>>>> the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> just-threaded path.  By removing the common prefix of blocks in 
> >>>>>>>> upcoming
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> threadable paths, and then rewiring first non-common block
> >>>>>>>> appropriately, we expose new threading opportunities, since we are no
> >>>>>>>>
> >>>>>>>> longer starting from the same BB.  We also simplify the would-be
> >>>>>>>> threaded paths, because we don't duplicate already duplicated paths.
> >>>> [snip]
> >>>>> Hi,
> >>>>>
> >>>>> I've noticed a regression on aarch64:
> >>>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
> >>>>> threaded: 3"
> >>>>> very likely caused by this patch (appeared between 262282 and 262294)
> >>>>>
> >>>>> Christophe
> >>>>
> >>>> The test needs to be adjusted here.
> >>>>
> >>>> The long story is that the aarch64 IL is different at thread3 time in
> >>>> that it has 2 profitable sub-paths that can now be threaded with my
> >>>> patch.  This is causing the threaded count to be 5 for aarch64, versus 3
> >>>> for x86 64.  Previously we couldn't thread these in aarch64, so the
> >>>> backwards threader would bail.
> >>>>
> >>>> One can see the different threading opportunities by sticking
> >>>> debug_all_paths() at the top of thread_through_all_blocks().  You will
> >>>> notice that aarch64 has far more candidates to begin with.  The IL on
> >>>> the x86 backend, has no paths that start on the same BB.  The aarch64,
> >>>> on the other hand, has many to choose from:
> >>>>
> >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
> >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
> >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
> >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
> >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
> >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
> >>>>
> >>>> Some of these prove unprofitable, but 2 more than before are profitable 
> >>>> now.
> >>>>
> >>>>
> >>>>
> >>>> BTW, I see another threading related failure on aarch64 which is
> >>>> unrelated to my patch, and was previously there:
> >>>>
> >>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
> >>>> threaded"
> >>>>
> >>>> This is probably another IL incompatibility between architectures.
> >>>>
> >>>> Anyways... the attached path fixes the regression.  I have added a note
> >>>> to the test explaining the IL differences.  We really should rewrite all
> >>>> the threading tests (I am NOT volunteering ;-)).
> >>>>
> >>>> OK for trunk?
> >>>> Aldy
> >>>>
> >>>> curr.patch
> >>>>
> >>>>
> >>>> gcc/testsuite/
> >>>>
> >>>>      * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
> >>>>      has a slightly different IL that provides more threading
> >>>>      opportunities.
> >>> OK.
> >>>
> >>> WRT rewriting the tests.  I'd certainly agree that we don't have the
> >>> right set of knobs to allow us to characterize the target nor do we have
> >>> the right dumping/scanning facilities to describe and query the CFG
> >>> changes.
> >>>
> >>> The fact that the IL changes so much across targets is a sign that
> >>> target dependency (probably BRANCH_COST) is twiddling the gimple we
> >>> generate.  I strongly suspect we'd be a lot better off if we tackled the
> >>> BRANCH_COST problem first.
> >>
> >> Huh.  I've always accepted differing IL between architectures as a
> >> necessary evil for things like auto-vectorization and the like.
> > Yes.  We've made a conscious decision that introducing target
> > dependencies for the autovectorizer makes sense.  BRANCH_COST on the
> > other hand is a different beast :-)
> >
> >>
> >> What's the ideal plan here? A knob to set default values for target
> >> dependent variables that can affect IL layout?  Then we could pass
> >> -fthis-is-an-IL-test and things be normalized?
> > Well, lots of things.
> >
> > I'd like decisions about how to expand branches deferred until rtl
> > expansion.  Kai was poking at this in the past but never really got any
> > traction.
>
> For the record, the problem in this testcase is that switch lowering is
> riddled with back end specific knowledge (GET_MODE_SIZE uses as well as
> some rtx cost hacks).
>
> >
> > Many tests should turn into gimple IL tests.
>
> Yeah, though for tests like the threading ones, they're already
> sufficiently convoluted that turning them into gimple IL tests will make
> them even harder to read.  Oh well, I guess?
>
> To make it easier to transition to gimple IL tests, I suppose we could
> add -fdump-tree-*-something-more-suitable-for-gimplefe ;-).

There is -fdump-tree-*-gimple ;)

> Is it by
> design that the gimple fe is sufficiently different from what we dump
> for -fdump-tree (ahem, things like phi arguments, probabilities, etc)?

Well.  The only reason is that I didnt' want to adjust all testcases so
I introduced the -gimple dump modifier...

Note there's still some "magic" fiddling needed to make the GIMPLE FE
grok -gimple dump files as well as working around limitations with the
current way of having SSA/CFG testcases.

> >
> > And I'd like a better framework for testing what we're doing to the IL.
>
> Sigh.  Me too.

Well, the GIMPLE FE is supposed to be that - it's just not perfect
(or rather incomplete).

Richard.

> Aldy

Reply via email to