On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote:
I don't offhand know if any of the benchmarks you cite above are free-enough to 
derive a testcase from.  But one trick many of us use is to instrument the 
>>pass and compile some known free software (often gcc
itself) to find triggering code and use that to generate tests for the new 
transformation.

I will add tests in the suite. I could see many existing tests in the suite 
also get triggered with this optimization.
Thanks. For cases in the existing testsuite where you need to change the expected output, it's useful to note why the expected output was changed. Sometimes a test is compromised by a new optimization, sometimes the expected output is changed and is papering over a problem, etc so it's something we look at reasonably closely.



diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
index 9faa339..559ca96 100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -581,7 +581,7 @@ delete_basic_block (basic_block bb)

         /* If we remove the header or the latch of a loop, mark the loop for
         removal.  */
-      if (loop->latch == bb
+      if (loop && loop->latch == bb
          || loop->header == bb)
        mark_loop_for_removal (loop);
So what caused you to add this additional test?  In general loop
structures are supposed to always be available.  The change here implies
that the loop structures were gone at some point.  That seems at first
glance a mistake.

I was using the gimple_duplicate_bb which will not add the duplicate basic 
block inside the current_loops. That's why the above
Condition is required. I am using duplicate_block instead of 
gimple_duplicate_bb. With this change the above check with loop
Is not required as it adds the duplicate basic block inside the loops.
OK.  Good to hear it's not required anymore.




diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index aed5254..b25e409 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val)
       }
   }

+void
+gimple_threaded_merge_blocks (basic_block a, basic_block b)
If we keep this function it will need a block comment.

I say "if" for a couple reasons.  First we already have support routines
that know how to merge blocks.  If you really need to merge blocks you
should try to use them.

Second, I'm not sure that you really need to worry about block merging
in this pass.  Just create the duplicates, wire them into the CFG and
let the existing block merging support handle this problem.

The above routine is not merging but duplicates the join nodes into its 
predecessors. If I change the name of the above
Function to the gimple_threaded_duplicating_join_node it should be fine.
But you don't need to duplicate into the predecessors. If you create the duplicates and wire them into the CFG properly the existing code in cfgcleanup should take care of this for you.


diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 4303a18..2c7d36d 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e,
     return 0;
   }

+static void
+replace_threaded_uses (basic_block a,basic_block b)
If you keep this function, then it'll need a function comment.

It looks like this is just doing const/copy propagation.  I think a
better structure is to implement your optimization as a distinct pass,
then rely on existing passes such as update_ssa, DOM, CCP to handle
updating the SSA graph and propagation opportunities exposed by your
transformation.


Similarly for the other replace_ functions.

I think these replace_ functions are required as existing Dom, CCP and 
propagation opportunities doesn't transform these
Propagation given below.

<bb 29>:
xk_124 = MIN_EXPR <xy_123, xc_121>;
xc_126 = xc_121 - xk_6;
xm_127 = xm_122 - xk_6;
xy_128 = xy_123 - xk_6;
*EritePtr_14 = xc_126;
MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
i_137 = i_4 + 1;
goto <bb 31>;

<bb 30>:
xk_125 = MIN_EXPR <xy_123, xm_122>;
xc_165 = xc_121 - xk_6;
xm_166 = xm_122 - xk_6;
xy_167 = xy_123 - xk_6;
*EritePtr_14 = xc_126;
MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
i_173 = i_4 + 1;

<bb 29> and <bb 30 > are the predecessors of the join node. After the 
duplication of join node and the duplicating the
Join node into the predecessors < bb 29> and <bb 30>, the above is the gimple 
representations.
}

But the < bb 30 > should be the following after the transformation.

<bb 30>:
xk_125 = MIN_EXPR <xy_123, xm_122>;
xc_165 = xc_121 - xk_125;
xm_166 = xm_122 - xk_125;
xy_167 = xy_123 - xk_125;
*EritePtr_14 = xc_165;
MEM[(Byte *)EritePtr_14 + 1B] = xm_166;
MEM[(Byte *)EritePtr_14 + 2B] = xy_167;
EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_125;
i_173 = i_4 + 1;

The phi-only cprop will  replace the x_6 with xk_125 but the other replacements 
like xk_126, xk_127, xk_128 with
Xk_165, xk_166, xk_167 will not be done by the existing routine like 
phi-only-cprop as this not only copy propagation because the duplicate block 
defines
New results but the copy is still old with the join node.

The replace function does the above transformation and I think it is required.
Can you send me the testcase this came from so that I can compile and examine the dumps myself and possibly poke at things with the debugger? I don't have enough context to know what's missing.

If indeed there's a form of propagation missing, we should enhance the propagation routines to handle the missing cases rather than write new ones inside a pass.

jeff

Reply via email to