On Mon, Jul 9, 2018 at 9:19 AM Aldy Hernandez <al...@redhat.com> wrote: > > > > On 07/05/2018 03:27 PM, Jeff Law wrote: > > On 07/04/2018 02:12 AM, Aldy Hernandez wrote: > >> > >> > >> On 07/03/2018 08:16 PM, Jeff Law wrote: > >>> On 07/03/2018 03:31 AM, Aldy Hernandez wrote: > >>>> On 07/02/2018 07:08 AM, Christophe Lyon wrote: > >>>> > >>>>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote: > >>>>>>>> While poking around in the backwards threader I noticed that we bail > >>>>>>>> if > >>>>>>>> > >>>>>>>> > >>>>>>>> we have already seen a starting BB. > >>>>>>>> > >>>>>>>> /* Do not jump-thread twice from the same block. */ > >>>>>>>> if (bitmap_bit_p (threaded_blocks, entry->src->index) > >>>>>>>> > >>>>>>>> This limitation discards paths that are sub-paths of paths that have > >>>>>>>> already been threaded. > >>>>>>>> > >>>>>>>> The following patch scans the remaining to-be-threaded paths to > >>>>>>>> identify > >>>>>>>> > >>>>>>>> > >>>>>>>> if any of them start from the same point, and are thus sub-paths of > >>>>>>>> the > >>>>>>>> > >>>>>>>> > >>>>>>>> just-threaded path. By removing the common prefix of blocks in > >>>>>>>> upcoming > >>>>>>>> > >>>>>>>> > >>>>>>>> threadable paths, and then rewiring first non-common block > >>>>>>>> appropriately, we expose new threading opportunities, since we are no > >>>>>>>> > >>>>>>>> longer starting from the same BB. We also simplify the would-be > >>>>>>>> threaded paths, because we don't duplicate already duplicated paths. > >>>> [snip] > >>>>> Hi, > >>>>> > >>>>> I've noticed a regression on aarch64: > >>>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps > >>>>> threaded: 3" > >>>>> very likely caused by this patch (appeared between 262282 and 262294) > >>>>> > >>>>> Christophe > >>>> > >>>> The test needs to be adjusted here. > >>>> > >>>> The long story is that the aarch64 IL is different at thread3 time in > >>>> that it has 2 profitable sub-paths that can now be threaded with my > >>>> patch. This is causing the threaded count to be 5 for aarch64, versus 3 > >>>> for x86 64. Previously we couldn't thread these in aarch64, so the > >>>> backwards threader would bail. > >>>> > >>>> One can see the different threading opportunities by sticking > >>>> debug_all_paths() at the top of thread_through_all_blocks(). You will > >>>> notice that aarch64 has far more candidates to begin with. The IL on > >>>> the x86 backend, has no paths that start on the same BB. The aarch64, > >>>> on the other hand, has many to choose from: > >>>> > >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, > >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, > >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, > >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35, > >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35, > >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35, > >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17, > >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17, > >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19, > >>>> > >>>> Some of these prove unprofitable, but 2 more than before are profitable > >>>> now. > >>>> > >>>> > >>>> > >>>> BTW, I see another threading related failure on aarch64 which is > >>>> unrelated to my patch, and was previously there: > >>>> > >>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps > >>>> threaded" > >>>> > >>>> This is probably another IL incompatibility between architectures. > >>>> > >>>> Anyways... the attached path fixes the regression. I have added a note > >>>> to the test explaining the IL differences. We really should rewrite all > >>>> the threading tests (I am NOT volunteering ;-)). > >>>> > >>>> OK for trunk? > >>>> Aldy > >>>> > >>>> curr.patch > >>>> > >>>> > >>>> gcc/testsuite/ > >>>> > >>>> * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64 > >>>> has a slightly different IL that provides more threading > >>>> opportunities. > >>> OK. > >>> > >>> WRT rewriting the tests. I'd certainly agree that we don't have the > >>> right set of knobs to allow us to characterize the target nor do we have > >>> the right dumping/scanning facilities to describe and query the CFG > >>> changes. > >>> > >>> The fact that the IL changes so much across targets is a sign that > >>> target dependency (probably BRANCH_COST) is twiddling the gimple we > >>> generate. I strongly suspect we'd be a lot better off if we tackled the > >>> BRANCH_COST problem first. > >> > >> Huh. I've always accepted differing IL between architectures as a > >> necessary evil for things like auto-vectorization and the like. > > Yes. We've made a conscious decision that introducing target > > dependencies for the autovectorizer makes sense. BRANCH_COST on the > > other hand is a different beast :-) > > > >> > >> What's the ideal plan here? A knob to set default values for target > >> dependent variables that can affect IL layout? Then we could pass > >> -fthis-is-an-IL-test and things be normalized? > > Well, lots of things. > > > > I'd like decisions about how to expand branches deferred until rtl > > expansion. Kai was poking at this in the past but never really got any > > traction. > > For the record, the problem in this testcase is that switch lowering is > riddled with back end specific knowledge (GET_MODE_SIZE uses as well as > some rtx cost hacks). > > > > > Many tests should turn into gimple IL tests. > > Yeah, though for tests like the threading ones, they're already > sufficiently convoluted that turning them into gimple IL tests will make > them even harder to read. Oh well, I guess? > > To make it easier to transition to gimple IL tests, I suppose we could > add -fdump-tree-*-something-more-suitable-for-gimplefe ;-).
There is -fdump-tree-*-gimple ;) > Is it by > design that the gimple fe is sufficiently different from what we dump > for -fdump-tree (ahem, things like phi arguments, probabilities, etc)? Well. The only reason is that I didnt' want to adjust all testcases so I introduced the -gimple dump modifier... Note there's still some "magic" fiddling needed to make the GIMPLE FE grok -gimple dump files as well as working around limitations with the current way of having SSA/CFG testcases. > > > > And I'd like a better framework for testing what we're doing to the IL. > > Sigh. Me too. Well, the GIMPLE FE is supposed to be that - it's just not perfect (or rather incomplete). Richard. > Aldy