Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-10-18 Thread Richard Biener
On Fri, Jun 21, 2013 at 11:21 PM, Steve Ellcey sell...@mips.com wrote:
 On Fri, 2013-06-21 at 17:43 +0100, James Greenhalgh wrote:

 So to my mind, this is all far too tied up in pass ordering details to
 resolve. Given that all the threading opportunities for my patch are found
 in dom1 and how fragile the positioning of dom1 is, there is not a great
 deal I can do to modify the ordering.

 The biggest improvement I could find comes from rerunning pass_ch
 immediately after dom1, though I'm not sure what the cost of that
 would be.

 I wonder if you or others have any thoughts on what the right thing to
 do would be?

  I am not sure if path threading in general is turned off for -Os but it
  probably should be.

 I agree, jump threading is on at -Os, path threading should not be.

 Thanks,
 James

 I think that since it seems to be more a space issue then a time issue,
 the way you currently have it is reasonable.  As long as the
 optimization does not happen at -Os then I think we can probably live
 with the space increase.

I suppose with Jeffs recent work on jump-threading through paths
this case in handled and the patch in this thread is obsolete or can
be reworked?

Thanks,
Richard.

 Steve Ellcey
 sell...@mips.com




Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-10-18 Thread James Greenhalgh
On Fri, Oct 18, 2013 at 11:55:08AM +0100, Richard Biener wrote:
 I suppose with Jeffs recent work on jump-threading through paths
 this case in handled and the patch in this thread is obsolete or can
 be reworked?

Yes, this patch is now obsolete, Jeff's solution is much more
elegant :-)

Thanks,
James



Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-10-18 Thread Jeff Law

On 10/18/13 04:55, Richard Biener wrote:

On Fri, Jun 21, 2013 at 11:21 PM, Steve Ellcey sell...@mips.com wrote:

On Fri, 2013-06-21 at 17:43 +0100, James Greenhalgh wrote:


So to my mind, this is all far too tied up in pass ordering details to
resolve. Given that all the threading opportunities for my patch are found
in dom1 and how fragile the positioning of dom1 is, there is not a great
deal I can do to modify the ordering.

The biggest improvement I could find comes from rerunning pass_ch
immediately after dom1, though I'm not sure what the cost of that
would be.

I wonder if you or others have any thoughts on what the right thing to
do would be?


I am not sure if path threading in general is turned off for -Os but it
probably should be.


I agree, jump threading is on at -Os, path threading should not be.

Thanks,
James


I think that since it seems to be more a space issue then a time issue,
the way you currently have it is reasonable.  As long as the
optimization does not happen at -Os then I think we can probably live
with the space increase.


I suppose with Jeffs recent work on jump-threading through paths
this case in handled and the patch in this thread is obsolete or can
be reworked?
It trivially falls out of the work I'm doing.  The biggest question will 
be what to do about the loop structures.  For the FSA case from coremark 
the loop structures get mucked up way beyond repair.


My thoughts were to generally continue to not allow major damage to the 
loop structures when threading jumps --  with the exception being if we 
can eliminate an indirect or tablejump at the top of a loop.  Possibly 
further restricting it to the jump threading done after our loop 
optimizers have completed.


I haven't looked at that heuristic yet as I've primarily focused on 
having the right infrastructure to correctly update the SSA and CFG in 
the general case.




jeff


Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-10-18 Thread Jeff Law

On 10/18/13 05:05, James Greenhalgh wrote:

On Fri, Oct 18, 2013 at 11:55:08AM +0100, Richard Biener wrote:

I suppose with Jeffs recent work on jump-threading through paths
this case in handled and the patch in this thread is obsolete or can
be reworked?


Yes, this patch is now obsolete, Jeff's solution is much more
elegant :-)
Not sure if I'd use the term elegant :-)  My solution is certainly more 
general though.  There's still a couple rounds of staging bits as I 
clean up the rough edges.


The need to store a full thread path to get SSA graph update correct in 
cases where we have a joiner block, followed by multiple threadable 
blocks without side effects, followed by a threadable block with side 
effects was unexpected.


Jeff


Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-06-21 Thread James Greenhalgh
  While testing it I noticed that the final executable
 is larger with your patch then with mine.  Here are the sizes of the
 bare-metal executables I created using the same flags I sent you
 earlier, the first has no switch optimization, the second one uses my
 plugin optimization, and the third uses your latest patch.  I haven't
 looked into why the size difference for your patch and mine exists, do
 you see a size difference on your platforms? 

Yes I do, but after playing around with it, this seems very dependant
on pass ordering.

I've built various arm-none-eabi compilers to test with:

  clean: is a compiler without path threading.
  steve.pass: is your original pass patch.
  james: is my patch (which will be called within vrp and dom passes)

  steve.after-vrp1: moves your pass to immediately after the first call
to vrp

  steve.before, steve.after, steve.after-vrp-before-dom,
  steve.before-vrp-after-dom: run your pass immediately before or after
both vrp and both dom passes.

  james.ch is my patch, rerunning pass_ch after dom1.

Then, building with flags:

  -finline-limit=1000 -funroll-all-loops
  -finline-functions [[-ftree-switch-shortcut]] -O3 -mthumb

And passing the resulting binary through:

$ arm-none-eabi-strip blob.*

I see:

$ size blob.arm.* | sort -n

   textdata bss dec hex filename
  539842548 296   56828ddfc ../blobs/blob.arm.clean
  544642548 296   57308dfdc ../blobs/blob.arm.steve.pass
  544962548 296   57340dffc ../blobs/blob.arm.steve.after
  544962548 296   57340dffc 
../blobs/blob.arm.steve.after-vrp-before-dom
  545042548 296   57348e004 ../blobs/blob.arm.james.ch
  545042548 296   57348e004 ../blobs/blob.arm.steve.only-after-vrp1
  546562548 296   57500e09c ../blobs/blob.arm.james
  547042548 296   57548e0cc 
../blobs/blob.arm.steve.before-vrp-after-dom
  547362548 296   57580e0ec ../blobs/blob.arm.steve.before

So to my mind, this is all far too tied up in pass ordering details to
resolve. Given that all the threading opportunities for my patch are found
in dom1 and how fragile the positioning of dom1 is, there is not a great
deal I can do to modify the ordering.

The biggest improvement I could find comes from rerunning pass_ch
immediately after dom1, though I'm not sure what the cost of that
would be.

I wonder if you or others have any thoughts on what the right thing to
do would be?

 I am not sure if path threading in general is turned off for -Os but it
 probably should be.

I agree, jump threading is on at -Os, path threading should not be.

Thanks,
James



Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-06-21 Thread Steve Ellcey
On Fri, 2013-06-21 at 17:43 +0100, James Greenhalgh wrote:

 So to my mind, this is all far too tied up in pass ordering details to
 resolve. Given that all the threading opportunities for my patch are found
 in dom1 and how fragile the positioning of dom1 is, there is not a great
 deal I can do to modify the ordering.
 
 The biggest improvement I could find comes from rerunning pass_ch
 immediately after dom1, though I'm not sure what the cost of that
 would be.
 
 I wonder if you or others have any thoughts on what the right thing to
 do would be?
 
  I am not sure if path threading in general is turned off for -Os but it
  probably should be.
 
 I agree, jump threading is on at -Os, path threading should not be.
 
 Thanks,
 James

I think that since it seems to be more a space issue then a time issue,
the way you currently have it is reasonable.  As long as the
optimization does not happen at -Os then I think we can probably live
with the space increase.

Steve Ellcey
sell...@mips.com




Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-06-19 Thread James Greenhalgh

 -Original Message-
 From: Steve Ellcey [mailto:sell...@mips.com]
 Sent: 14 June 2013 19:07

 With my version the compiler calls gimple_duplicate_sese_region from
 duplicate_blocks 60 times.  With your patch it calls
 gimple_duplicate_sese_region from duplicate_thread_path 16 times.


Hi Steve,

You are quite right. With -finline-limit=1000 I see
a big difference in performance. The cause of this is
the code added to
tree-ssa-threadedge.c (simplify_control_stmt_condition).

If we have failed to simplify to a gimple_min_invariant,
we want to look for thread paths to the value given by
gimple_goto_dest, rather than the SSA_NAME_VALUE of that
value.

This improves the performance on my x86_64 toolchain to the
same level as your patch. I see Registered 20 jump paths
printed 3 times in dom1, for a total of 60 thread paths.

I've also fixed another couple of bugs I spotted, improved
logging of results and added the parameters that were in
your patch.

I did investigate changing the search strategy back to yours,
but I saw no impact on the thread paths found.

Please let me know if this fixes the performance issues you
were seeing and if you have any other comments.

FWIW I've bootstrapped and regression tested this version of
the patch on x86_64 and ARM with no regressions.

Thanks,
James Greenhalgh

---

Changes from v1:

---
gcc/

2013-06-19  James Greenhalgh  james.greenha...@arm.com

* params.def (PARAM_MAX_THREAD_PATH_INSNS): New.
(PARAM_THREAD_PATHS): Likewise.
* tree-ssa-threadedge.c
(simplify_control_stmt_condition): If we can't simplify
cond, return it unmodified.
(max_insn_count): Do not initialize.
(max_path_count): Likewise.
(find_control_statement_thread_paths): Catch case where path
has already been computed (thus no further path exists),
add sanity checking.
(thread_across_edge): Initialize max_{insn, path}_count;
* tree-ssa-threadupdate.c
(duplicate_thread_path): Add sanity check, logging.
(thread_through_all_blocks): Thread paths, even if no
threaded_edges were found.
(register_thread_paths): Improve logging.

---

Changelog:

gcc/

2013-06-19  James Greenhalgh  james.greenha...@arm.com

PR tree-optimization/54742
* params.def (PARAM_MAX_THREAD_PATH_INSNS): New.
(PARAM_THREAD_PATHS): Likewise.
* tree.cfg (gimple_duplicate_sese_region): Memoize loop latch
and loop header blocks if copying across a latch/header.
* tree-flow.h (thread_paths): New struct.
(register_thread_paths): Declare.
* tree-ssa-threadedge.c
(simplify_control_stmt_condition): Permit returning something
not in gimple_min_invariant form.
(max_insn_count): Declare.
(max_path_count): Likewise.
(find_thread_path_1): New function.
(find_thread_path): Likewise.
(save_new_thread_path): Likewise.
(find_control_statement_thread_paths): Likewise.
(thread_across_edge): Handle non gimple_min_invariant cases.
* tree-ssa-threadupdate.c (thread_paths_vec): New.
(remove_edge_from_thread_candidates): New function.
(duplicate_thread_path): Likewise.
(copy_control_statement_thread_paths): Likewise.
(thread_through_all_blocks): Handle thread_paths.
(register_thread_paths): New function.
diff --git a/gcc/params.def b/gcc/params.def
index 3c52651..25d36a6 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -123,6 +123,19 @@ DEFPARAM (PARAM_PARTIAL_INLINING_ENTRY_PROBABILITY,
 	  Maximum probability of the entry BB of split region (in percent relative to entry BB of the function) to make partial inlining happen,
 	  70, 0, 0)
 
+/* Maximum number of instructions to copy when duplicating blocks
+   on a jump thread path.  */
+DEFPARAM (PARAM_MAX_THREAD_PATH_INSNS,
+		max-thread-path-insns,
+		Maximum number of instructions to copy when duplicating blocks on a jump thread path,
+		100, 1, 99)
+
+/* Maximum number of jump thread paths to duplicate.  */
+DEFPARAM (PARAM_MAX_THREAD_PATHS,
+		max-thread-paths,
+		Maximum number of new jump thread paths to create,
+		50, 1, 99)
+
 /* Limit the number of expansions created by the variable expansion
optimization to avoid register pressure.  */
 DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS,
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 4b91a35..6dcd2e4 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -5717,10 +5717,12 @@ gimple_duplicate_sese_region (edge entry, edge exit,
 {
   unsigned i;
   bool free_region_copy = false, copying_header = false;
+  bool save_loop_details = false;
   struct loop *loop = entry-dest-loop_father;
   edge exit_copy;
   vecbasic_block doms;
   edge redirected;
+  int memo_loop_header_no = 0, memo_loop_latch_no = 0;
   int total_freq = 0, entry_freq = 0;
   gcov_type total_count = 0, entry_count = 0;
 
@@ -5738,9 +5740,15 @@ gimple_duplicate_sese_region 

Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-06-19 Thread Steve Ellcey
On Wed, 2013-06-19 at 14:19 +0100, James Greenhalgh wrote:

 Please let me know if this fixes the performance issues you
 were seeing and if you have any other comments.
 
 FWIW I've bootstrapped and regression tested this version of
 the patch on x86_64 and ARM with no regressions.
 
 Thanks,
 James Greenhalgh


James,

This patch does give me the same performance as my original patch, so
that is excellent.  While testing it I noticed that the final executable
is larger with your patch then with mine.  Here are the sizes of the
bare-metal executables I created using the same flags I sent you
earlier, the first has no switch optimization, the second one uses my
plugin optimization, and the third uses your latest patch.  I haven't
looked into why the size difference for your patch and mine exists, do
you see a size difference on your platforms?  I am not sure if path
threading in general is turned off for -Os but it probably should be.

% ll -art coremark.fsf*elf
-rwxr-xr-x 1 sellcey src 413812 Jun 19 11:11 coremark.fsf.1.elf
-rwxr-xr-x 1 sellcey src 414676 Jun 19 11:11 coremark.fsf.2.elf
-rwxr-xr-x 1 sellcey src 416402 Jun 19 11:11 coremark.fsf.3.elf


Steve Ellcey
sell...@mips.com




Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-06-14 Thread Steve Ellcey
On Fri, 2013-06-14 at 00:08 +0100, James Greenhalgh wrote:

 I see only minor cosmetic differences (Placement of labels,
 numbering of labels) between the two generated assembly files.
 
 Perhaps you could share the flags you were using and I can try to
 figure out which paths I seem to be missing. If I can't find them,
 I'll happily revert to using your search strategy.
 
 Regards,
 James Greenhalgh

I am building a cross compiler running on x86_64 and targeting
mips-mti-elf.  I ran the compiler with:

-mips32r2 -EL -msoft-float -O3 -funroll-all-loops -finline-functions
-finline-limit=1000 -DRTEYAMON

With my version the compiler calls gimple_duplicate_sese_region from
duplicate_blocks 60 times.  With your patch it calls
gimple_duplicate_sese_region from duplicate_thread_path 16 times.

Steve Ellcey
sell...@mips.com




Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-06-13 Thread Steve Ellcey
On Fri, 2013-06-07 at 16:14 +0100, James Greenhalgh wrote:


 Beyond that, the path search code is modified from Steve's patch
 to only perform one depth first search, and the path copy code
 is modified to purge the region entry and exit edges for those
 which traditional jump-threading may consider.
 
 Opinions?
 
 Thanks,
 James Greenhalgh

Hi James,

I tried out your patch and I am not getting the speed up in
coremark that I got with my patch.  I wonder if restricting it
to one depth first search is the reason.  With my patch I was
able to completely get rid of the switch statement, but with
this patch it still exists.  Maybe we need an option to do
multiple searches?

Steve Ellcey
sell...@mips.com




Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-06-13 Thread James Greenhalgh
On Thu, Jun 13, 2013 at 08:29:08PM +0100, Steve Ellcey wrote:
 On Fri, 2013-06-07 at 16:14 +0100, James Greenhalgh wrote:
 
  Beyond that, the path search code is modified from Steve's patch
  to only perform one depth first search, and the path copy code
  is modified to purge the region entry and exit edges for those
  which traditional jump-threading may consider.
  
  Opinions?
  
  Thanks,
  James Greenhalgh
 
 Hi James,
 
 I tried out your patch and I am not getting the speed up in
 coremark that I got with my patch.  I wonder if restricting it
 to one depth first search is the reason.  With my patch I was
 able to completely get rid of the switch statement, but with
 this patch it still exists.  Maybe we need an option to do
 multiple searches?
 
 Steve Ellcey
 sell...@mips.com
 

Hi Steve,

Thanks for having a look at the patch, though you appear to get
very different results to my local build.

Comparing a bootstrapped native x86_64 compiler with my patch and
these flags:

  /work/gcc-build-jg-threading/build-x86/install/bin/gcc -S -O3 -Ilinux64 
core_state.c

And a bootstrapped native x86_64 compiler with your patch and
these flags:

  /work/gcc-build-sje-threading/build-x86/install/bin/gcc -S -O3 -Ilinux64 
-ftree-switch-shortcut core_state.c

I see only minor cosmetic differences (Placement of labels,
numbering of labels) between the two generated assembly files.

Perhaps you could share the flags you were using and I can try to
figure out which paths I seem to be missing. If I can't find them,
I'll happily revert to using your search strategy.

Regards,
James Greenhalgh