On 07/02/2018 07:08 AM, Christophe Lyon wrote:

On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
While poking around in the backwards threader I noticed that we bail if
we have already seen a starting BB.

        /* Do not jump-thread twice from the same block.  */
        if (bitmap_bit_p (threaded_blocks, entry->src->index)

This limitation discards paths that are sub-paths of paths that have
already been threaded.

The following patch scans the remaining to-be-threaded paths to identify
if any of them start from the same point, and are thus sub-paths of the
just-threaded path.  By removing the common prefix of blocks in upcoming
threadable paths, and then rewiring first non-common block
appropriately, we expose new threading opportunities, since we are no
longer starting from the same BB.  We also simplify the would-be
threaded paths, because we don't duplicate already duplicated paths.
[snip]
Hi,

I've noticed a regression on aarch64:
FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
threaded: 3"
very likely caused by this patch (appeared between 262282 and 262294)

Christophe

The test needs to be adjusted here.

The long story is that the aarch64 IL is different at thread3 time in that it has 2 profitable sub-paths that can now be threaded with my patch. This is causing the threaded count to be 5 for aarch64, versus 3 for x86 64. Previously we couldn't thread these in aarch64, so the backwards threader would bail.

One can see the different threading opportunities by sticking debug_all_paths() at the top of thread_through_all_blocks(). You will notice that aarch64 has far more candidates to begin with. The IL on the x86 backend, has no paths that start on the same BB. The aarch64, on the other hand, has many to choose from:

path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,

Some of these prove unprofitable, but 2 more than before are profitable now.

BTW, I see another threading related failure on aarch64 which is unrelated to my patch, and was previously there:

FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps threaded"

This is probably another IL incompatibility between architectures.

Anyways... the attached path fixes the regression. I have added a note to the test explaining the IL differences. We really should rewrite all the threading tests (I am NOT volunteering ;-)).

OK for trunk?
Aldy
gcc/testsuite/

	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
	has a slightly different IL that provides more threading
	opportunities.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 9ee8d12010b..e395de26ec0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -2,11 +2,16 @@
 /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 16"  "thread1" } } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 1"  "dom2" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" } } */
 
+/* Most architectures get 3 threadable paths here, whereas aarch64 and
+   possibly others get 5.  We really should rewrite threading tests to
+   test a specific IL sequence, not gobs of code whose IL can vary
+   from architecture to architecture.  */
+/* { dg-final { scan-tree-dump "Jumps threaded: \[35\]" "thread3" } } */
+
 enum STATE {
   S0=0,
   SI,

Reply via email to