Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-04-23 Thread Jeff Law

On 03/18/2015 12:19 AM, Andrew Pinski wrote:

On Tue, Mar 17, 2015 at 11:27 AM, Jeff Law l...@redhat.com wrote:

On 03/17/2015 04:35 AM, Richard Biener wrote:


I'll test both.  In the common case, the cost is going to be the basic
bookkeeping so that we can compute the transparent property.  The
actual
computation of transparency and everything else is guarded on having
something in the hash tables -- and the overwhelming majority of the
time there's nothing in the hash tables.

Regardless, I'll pin down boxes and do some testing.




I'm slightly leaning towards trying it even in stage4, but if e.g.


richi


disagrees, we could defer it to stage1 too.


I'd be OK either way.  I just want us to make a decision one way or the



If it fixes a regression then OK for trunk, otherwise please wait for
stage 1 to open.


It fixes 64317 which is a P2 regression.



I had a pass which I inherited from my predecessor which was basically
doing the same as your patch:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/postreload-load.c;h=2652e51f8eca310d51db3a30e5f6c8847be436ce;hb=refs/heads/apinski/thunderx-cost

But I like your patch much better than this one.  I saw many loads
being removed for AARCH64 also at -Ofast -funroll-loops on SPEC 2006
with this pass.  But it seemed to due to subregs which I had mentioned
at https://gcc.gnu.org/ml/gcc/2015-01/msg00125.html .  When I get a
chance I can test your patch out on AARCH64 and disable my pass to
see if my pass catches more than your patch.
ISTM this pass from your predecessor is tackling a different issue. 
Namely it arranges to copy the result of the load (or source of a store) 
from a register which gets clobbered into a non-clobbered hard register 
so that it can be re-used later.


My patch detected cases where the the load/store register is not 
clobbered across intermediate blocks and arranges to re-use it.


Unless I'm missing something, they ought to be able to work together.

I'd recommend officially submitting that code.  Bonus points for 
integrating it into the existing pass rather than running it as a 
separate pass.


jeff



Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-22 Thread Jeff Law

On 03/16/2015 01:27 PM, Jakub Jelinek wrote:

On Wed, Mar 11, 2015 at 03:30:36PM -0600, Jeff Law wrote:

+#ifndef GCC_GCSE__COMMONH
+#define GCC_GCSE__COMMONH


GCC_GCSE_COMMON_H instead?


@@ -1308,8 +1396,19 @@ gcse_after_reload_main (rtx f ATTRIBUTE_UNUSED)

if (expr_table-elements ()  0)
  {
+  /* Knowing which MEMs are transparent through a block can signifiantly
+increase the number of reundant loads found.  So compute transparency
+information for for each memory expression in the hash table.  */


s/for for/for/ ?


+  df_analyze ();
+  /* This can not be part of the normal allocation routine because
+we have to know the number of elements in the hash table.  */
+  transp = sbitmap_vector_alloc (last_basic_block_for_fn (cfun),
+expr_table-elements ());
+  bitmap_vector_ones (transp, last_basic_block_for_fn (cfun));
+  expr_table-traverse FILE *, compute_expr_transp (dump_file);
eliminate_partially_redundant_loads ();
delete_redundant_insns ();
+  sbitmap_vector_free (transp);

if (dump_file)
{


What effect does the patch have on compile time on say x86_64 or ppc64?

I'm slightly leaning towards trying it even in stage4, but if e.g. richi
disagrees, we could defer it to stage1 too.
I made the requested fixes as well as fixed another comment typo and 
fixed one real bug that showed up during some additional testing.


Basically the bitmap indices need to start counting from 0, not 1. 
Starting at 1 can result in an out-of-range write for the last element. 
 That's usually not a problem as there's space in the word, but I 
happened to have a case where we had 128 entries in the table and thus 
the out-of-bounds write hit a new word in memory clobbering heap 
metadata.  I'm glad this was caught prior to installation.


I've re-bootstrapped and regression tested on x86_64-linux-gnu and 
powerpc64-linux-gnu.  Installed on the trunk.  Actual patch committed is 
attached for archival purposes.



Jeff


commit f82a93981ea0f2ffbda93511a51c71910e10e4dd
Author: law law@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Mon Mar 23 05:21:04 2015 +

   PR rtl-optimization/64317
* Makefile.in (OBJS): Add gcse-common.c
* gcse.c: Include gcse-common.h
(struct modify_pair_s): Move structure definition to gcse-common.h
(compute_transp): Move function to gcse-common.c.
(canon_list_insert): Similarly.
(record_last_mem_set_info): Break out some code and put it into
gcse-common.c.  Call into the new common code.
(compute_local_properties): Pass additional arguments to 
compute_transp.
* postreload-gcse.c: Include gcse-common.h and df.h
(modify_mem_list_set, blocks_with_calls): New variables.
(modify_mem_list, canon_modify_mem_list, transp): Likewise.
(get_bb_avail_insn): Pass in the expression index too.
(alloc_mem): Allocate memory for the new bitmaps and lists.
(free_mem): Free memory for the new bitmaps and lists.
(insert_expr_in_table): Record a bitmap index for each entry we
add to the table.
(record_last_mem_set_info): Call into common code in gcse-common.c.
(get_bb_avail_insn): If no available insn was found in the requested
BB.  If BB has a single predecessor, see if the expression is
transparent in BB and available in that single predecessor.
(compute_expr_transp): New wrapper for compute_transp.
(eliminate_partially_redundant_load): Pass expression's bitmap_index
to get_bb_avail_insn.  Compute next_pred_bb_end a bit later.
(gcse_after_reload_main): If there are elements in the hash table,
then compute transparency for all the elements in the hash table.
* gcse-common.h: New file.
* gcse-common.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@221585 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a42d22a..b615c1f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,34 @@
+2015-03-22  Jeff Law  l...@redhat.com
+
+   PR rtl-optimization/64317
+   * Makefile.in (OBJS): Add gcse-common.c
+   * gcse.c: Include gcse-common.h
+   (struct modify_pair_s): Move structure definition to gcse-common.h
+   (compute_transp): Move function to gcse-common.c.
+   (canon_list_insert): Similarly.
+   (record_last_mem_set_info): Break out some code and put it into
+   gcse-common.c.  Call into the new common code.
+   (compute_local_properties): Pass additional arguments to compute_transp.
+   * postreload-gcse.c: Include gcse-common.h and df.h
+   (modify_mem_list_set, blocks_with_calls): New variables.
+   (modify_mem_list, canon_modify_mem_list, transp): Likewise.
+ 

Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-22 Thread Jeff Law

On 03/16/2015 01:27 PM, Jakub Jelinek wrote:

On Wed, Mar 11, 2015 at 03:30:36PM -0600, Jeff Law wrote:

+#ifndef GCC_GCSE__COMMONH
+#define GCC_GCSE__COMMONH


GCC_GCSE_COMMON_H instead?


@@ -1308,8 +1396,19 @@ gcse_after_reload_main (rtx f ATTRIBUTE_UNUSED)

if (expr_table-elements ()  0)
  {
+  /* Knowing which MEMs are transparent through a block can signifiantly
+increase the number of reundant loads found.  So compute transparency
+information for for each memory expression in the hash table.  */


s/for for/for/ ?


+  df_analyze ();
+  /* This can not be part of the normal allocation routine because
+we have to know the number of elements in the hash table.  */
+  transp = sbitmap_vector_alloc (last_basic_block_for_fn (cfun),
+expr_table-elements ());
+  bitmap_vector_ones (transp, last_basic_block_for_fn (cfun));
+  expr_table-traverse FILE *, compute_expr_transp (dump_file);
eliminate_partially_redundant_loads ();
delete_redundant_insns ();
+  sbitmap_vector_free (transp);

if (dump_file)
{


What effect does the patch have on compile time on say x86_64 or ppc64?

I'm slightly leaning towards trying it even in stage4, but if e.g. richi
disagrees, we could defer it to stage1 too.
Oh yea, forgot to mention, for PPC the compile-time testing results were 
essentially the same -- there's significantly more variation in the 
timings, but the before/after comparisons were in the noise.


Jeff


Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-22 Thread Jeff Law

On 03/18/2015 12:19 AM, Andrew Pinski wrote:

On Tue, Mar 17, 2015 at 11:27 AM, Jeff Law l...@redhat.com wrote:

On 03/17/2015 04:35 AM, Richard Biener wrote:


I'll test both.  In the common case, the cost is going to be the basic
bookkeeping so that we can compute the transparent property.  The
actual
computation of transparency and everything else is guarded on having
something in the hash tables -- and the overwhelming majority of the
time there's nothing in the hash tables.

Regardless, I'll pin down boxes and do some testing.




I'm slightly leaning towards trying it even in stage4, but if e.g.


richi


disagrees, we could defer it to stage1 too.


I'd be OK either way.  I just want us to make a decision one way or the



If it fixes a regression then OK for trunk, otherwise please wait for
stage 1 to open.


It fixes 64317 which is a P2 regression.



I had a pass which I inherited from my predecessor which was basically
doing the same as your patch:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/postreload-load.c;h=2652e51f8eca310d51db3a30e5f6c8847be436ce;hb=refs/heads/apinski/thunderx-cost

But I like your patch much better than this one.  I saw many loads
being removed for AARCH64 also at -Ofast -funroll-loops on SPEC 2006
with this pass.  But it seemed to due to subregs which I had mentioned
at https://gcc.gnu.org/ml/gcc/2015-01/msg00125.html .  When I get a
chance I can test your patch out on AARCH64 and disable my pass to
see if my pass catches more than your patch.
Well, they're doing different things, so they ought to be complementary 
to some degree.

Jeff


Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-21 Thread Jeff Law

On 03/16/2015 01:27 PM, Jakub Jelinek wrote:


What effect does the patch have on compile time on say x86_64 or ppc64?
I bootstrapped x86_64 trunk then timed compiling 640 .i/.ii files with 
-O2.  That was repeated 10 times (to get a sense of variability).  Each 
run took a little over 40 minutes with variability of less than a second 
(slowest run  compared to fastest run).


I then also gathered data for a smaller set of runs for -O2 
-funroll-loops (given the consistency between runs, 10 iterations seemed 
like major overkill).  Again variability was less than a second.


I then applied the patch and repeated the -O2 and -O2 -funroll-loops 
test.  The patched compiler was within a second of the unpatched 
compiler for both the -O2 and -O2 -funroll-loops tests.  Given the 
difference was within the noise level of those tests, my conclusion is 
the new code makes no measurable difference in compile time performance 
for x86_64.


Similar tests are in progress on powerpc64-linux-gnu.

Jeff


Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-21 Thread Jakub Jelinek
On Sat, Mar 21, 2015 at 01:47:10AM -0600, Jeff Law wrote:
 On 03/16/2015 01:27 PM, Jakub Jelinek wrote:
 
 What effect does the patch have on compile time on say x86_64 or ppc64?
 I bootstrapped x86_64 trunk then timed compiling 640 .i/.ii files with -O2.
 That was repeated 10 times (to get a sense of variability).  Each run took a
 little over 40 minutes with variability of less than a second (slowest run
 compared to fastest run).
 
 I then also gathered data for a smaller set of runs for -O2 -funroll-loops
 (given the consistency between runs, 10 iterations seemed like major
 overkill).  Again variability was less than a second.
 
 I then applied the patch and repeated the -O2 and -O2 -funroll-loops test.
 The patched compiler was within a second of the unpatched compiler for both
 the -O2 and -O2 -funroll-loops tests.  Given the difference was within the
 noise level of those tests, my conclusion is the new code makes no
 measurable difference in compile time performance for x86_64.
 
 Similar tests are in progress on powerpc64-linux-gnu.

Thanks for the testing.  I think we want the patch in now.

Jakub


Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-18 Thread Andrew Pinski
On Tue, Mar 17, 2015 at 11:27 AM, Jeff Law l...@redhat.com wrote:
 On 03/17/2015 04:35 AM, Richard Biener wrote:

 I'll test both.  In the common case, the cost is going to be the basic
 bookkeeping so that we can compute the transparent property.  The
 actual
 computation of transparency and everything else is guarded on having
 something in the hash tables -- and the overwhelming majority of the
 time there's nothing in the hash tables.

 Regardless, I'll pin down boxes and do some testing.



 I'm slightly leaning towards trying it even in stage4, but if e.g.

 richi

 disagrees, we could defer it to stage1 too.

 I'd be OK either way.  I just want us to make a decision one way or the


 If it fixes a regression then OK for trunk, otherwise please wait for
 stage 1 to open.

 It fixes 64317 which is a P2 regression.


I had a pass which I inherited from my predecessor which was basically
doing the same as your patch:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/postreload-load.c;h=2652e51f8eca310d51db3a30e5f6c8847be436ce;hb=refs/heads/apinski/thunderx-cost

But I like your patch much better than this one.  I saw many loads
being removed for AARCH64 also at -Ofast -funroll-loops on SPEC 2006
with this pass.  But it seemed to due to subregs which I had mentioned
at https://gcc.gnu.org/ml/gcc/2015-01/msg00125.html .  When I get a
chance I can test your patch out on AARCH64 and disable my pass to
see if my pass catches more than your patch.

Thanks,
Andrew


 jeff



Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-17 Thread Richard Biener
On March 16, 2015 8:45:18 PM GMT+01:00, Jeff Law l...@redhat.com wrote:
On 03/16/15 13:27, Jakub Jelinek wrote:
 On Wed, Mar 11, 2015 at 03:30:36PM -0600, Jeff Law wrote:
 +#ifndef GCC_GCSE__COMMONH
 +#define GCC_GCSE__COMMONH

 GCC_GCSE_COMMON_H instead?
:-) Will fix.


 @@ -1308,8 +1396,19 @@ gcse_after_reload_main (rtx f
ATTRIBUTE_UNUSED)

 if (expr_table-elements ()  0)
   {
 +  /* Knowing which MEMs are transparent through a block can
signifiantly
 +increase the number of reundant loads found.  So compute
transparency
 +information for for each memory expression in the hash table.  */

 s/for for/for/ ?
Similarly.



 +  df_analyze ();
 +  /* This can not be part of the normal allocation routine
because
 +we have to know the number of elements in the hash table.  */
 +  transp = sbitmap_vector_alloc (last_basic_block_for_fn
(cfun),
 +expr_table-elements ());
 +  bitmap_vector_ones (transp, last_basic_block_for_fn (cfun));
 +  expr_table-traverse FILE *, compute_expr_transp
(dump_file);
 eliminate_partially_redundant_loads ();
 delete_redundant_insns ();
 +  sbitmap_vector_free (transp);

 if (dump_file)
 {

 What effect does the patch have on compile time on say x86_64 or
ppc64?
I'll test both.  In the common case, the cost is going to be the basic 
bookkeeping so that we can compute the transparent property.  The
actual 
computation of transparency and everything else is guarded on having 
something in the hash tables -- and the overwhelming majority of the 
time there's nothing in the hash tables.

Regardless, I'll pin down boxes and do some testing.



 I'm slightly leaning towards trying it even in stage4, but if e.g.
richi
 disagrees, we could defer it to stage1 too.
I'd be OK either way.  I just want us to make a decision one way or the

If it fixes a regression then OK for trunk, otherwise please wait for stage 1 
to open.

Richard.

other :-)

Jeff




Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-17 Thread Jeff Law

On 03/17/2015 04:35 AM, Richard Biener wrote:


I'll test both.  In the common case, the cost is going to be the basic
bookkeeping so that we can compute the transparent property.  The
actual
computation of transparency and everything else is guarded on having
something in the hash tables -- and the overwhelming majority of the
time there's nothing in the hash tables.

Regardless, I'll pin down boxes and do some testing.




I'm slightly leaning towards trying it even in stage4, but if e.g.

richi

disagrees, we could defer it to stage1 too.

I'd be OK either way.  I just want us to make a decision one way or the


If it fixes a regression then OK for trunk, otherwise please wait for stage 1 
to open.

It fixes 64317 which is a P2 regression.

jeff



Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-16 Thread Jakub Jelinek
On Wed, Mar 11, 2015 at 03:30:36PM -0600, Jeff Law wrote:
 +#ifndef GCC_GCSE__COMMONH
 +#define GCC_GCSE__COMMONH

GCC_GCSE_COMMON_H instead?

 @@ -1308,8 +1396,19 @@ gcse_after_reload_main (rtx f ATTRIBUTE_UNUSED)
  
if (expr_table-elements ()  0)
  {
 +  /* Knowing which MEMs are transparent through a block can signifiantly
 +  increase the number of reundant loads found.  So compute transparency
 +  information for for each memory expression in the hash table.  */

s/for for/for/ ?

 +  df_analyze ();
 +  /* This can not be part of the normal allocation routine because
 +  we have to know the number of elements in the hash table.  */
 +  transp = sbitmap_vector_alloc (last_basic_block_for_fn (cfun),
 +  expr_table-elements ());
 +  bitmap_vector_ones (transp, last_basic_block_for_fn (cfun));
 +  expr_table-traverse FILE *, compute_expr_transp (dump_file);
eliminate_partially_redundant_loads ();
delete_redundant_insns ();
 +  sbitmap_vector_free (transp);
  
if (dump_file)
   {

What effect does the patch have on compile time on say x86_64 or ppc64?

I'm slightly leaning towards trying it even in stage4, but if e.g. richi
disagrees, we could defer it to stage1 too.

Jakub


Re: [PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-16 Thread Jeff Law

On 03/16/15 13:27, Jakub Jelinek wrote:

On Wed, Mar 11, 2015 at 03:30:36PM -0600, Jeff Law wrote:

+#ifndef GCC_GCSE__COMMONH
+#define GCC_GCSE__COMMONH


GCC_GCSE_COMMON_H instead?

:-) Will fix.




@@ -1308,8 +1396,19 @@ gcse_after_reload_main (rtx f ATTRIBUTE_UNUSED)

if (expr_table-elements ()  0)
  {
+  /* Knowing which MEMs are transparent through a block can signifiantly
+increase the number of reundant loads found.  So compute transparency
+information for for each memory expression in the hash table.  */


s/for for/for/ ?

Similarly.





+  df_analyze ();
+  /* This can not be part of the normal allocation routine because
+we have to know the number of elements in the hash table.  */
+  transp = sbitmap_vector_alloc (last_basic_block_for_fn (cfun),
+expr_table-elements ());
+  bitmap_vector_ones (transp, last_basic_block_for_fn (cfun));
+  expr_table-traverse FILE *, compute_expr_transp (dump_file);
eliminate_partially_redundant_loads ();
delete_redundant_insns ();
+  sbitmap_vector_free (transp);

if (dump_file)
{


What effect does the patch have on compile time on say x86_64 or ppc64?
I'll test both.  In the common case, the cost is going to be the basic 
bookkeeping so that we can compute the transparent property.  The actual 
computation of transparency and everything else is guarded on having 
something in the hash tables -- and the overwhelming majority of the 
time there's nothing in the hash tables.


Regardless, I'll pin down boxes and do some testing.




I'm slightly leaning towards trying it even in stage4, but if e.g. richi
disagrees, we could defer it to stage1 too.
I'd be OK either way.  I just want us to make a decision one way or the 
other :-)


Jeff



[PATCH][RFA] [PR rtl-optimization/64317] Enhance postreload-gcse.c to eliminate more redundant loads

2015-03-11 Thread Jeff Law


64317 is a P2 regression related to making the PIC register a pseudo 
register on the x86 port.


Basically post-LRA we still have some redundant loads of the PIC 
register from memory (it's spilled to memory as other objects are more 
profitable to keep in registers).  But of the set of ~10 actual loads, 
we really only need ~5.


The basic problem is that LRA and the basic postreload optimizers 
effectively forget everything they know when they encounter a block that 
starts with a label -- even if that block is only reachable from one 
predecessor.  ie, these routines work on a subset of EBBs, but not 
maximal EBBs.


The first thought was to have LRA clean this up since this is largely an 
issue with cross-block inheritance of values.   Unfortunately Vlad has 
indicated that handling maximal EBBs is nontrivial in LRA.


I also investigated extending the simple EBB oriented optimizer in 
postreload.c to handle this case.  But it turns out that cselib really 
isn't suited to a scoped hash table, which would be the natural way to 
handle this.  So I killed that idea.


That leaves postreload-gcse and if you look at the incoming RTL you'd 
think that postreload-gcse would handle this just fine.  Unfortunately, 
postreload-gcse is badly mis-named. Specifically it does no dataflow 
analysis of available expressions!!


There may be some benefit in full dataflow analysis in 
postreload-gcse.c.  But we can get most of the benefits with a 
relatively simple change.  Specifically if we're unable to find the 
expression we want in the given block, see if the block has a single 
predecessor.  If the expression is available in that predecessor and 
transparent through the current block, then the expression is still 
available at the end of the current block.


That was enough to fix the PR.  But after instrumentation I was 
discouraged by the fact that postreload-gcse does virtually nothing on 
x86.  It's unable to find a single available load to put in its hash 
tables on an x86, x86_64 or ppc64 bootstrap.  Though many are found 
during testsuite runs.


Digging further into the history of postreload-gcse indicated the 
postreload-gcse.c work was primarily tested on ppc using spec.  My first 
attempts compiling spec with the instrumented compiler didn't produce 
anything useful either.   On a whim, I tried again with -O3 -funroll 
loops, then suddenly postreload-gcse.c got reasonably active.


Across a subset of spec2k, compiling with -O3 -funroll-loops, this patch 
enables about a 2X improvement in loads eliminated (ppc64).  On a ppc64 
-O3 -funroll-loops bootstrap of GCC I see 4142 loads eliminated with 
this patch and 2739 loads eliminated without.  A notable improvement.


Even with -O2 -funroll-loops, there's nothing interesting happening on 
x86_64.  Probably because it's not a load-store architecture.


Anyway, the question in my mind is whether or not we want to put this in 
during stage4.  The patch itself is medium sized, but mostly mechanical 
in nature.


It's been bootstrapped on x86.  It's been bootstrapped and regression 
tested on x86_64.  It's been bootstrapped and regression tested with -O3 
-funroll-loops on ppc64 (configured with --disable-werror).


OK for the trunk or wait for stage1?  I'm happy to go with the release 
manager's opinions on this.


Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 28979d5..91863be 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,34 @@
+2015-03-11  Jeff Law  l...@redhat.com
+
+   PR rtl-optimization/64317
+   * Makefile.in (OBJS): Add gcse-common.c
+   * gcse.c: Include gcse-common.h
+   (struct modify_pair_s): Move structure definition to gcse-common.h
+   (compute_transp): Move function to gcse-common.c.
+   (canon_list_insert): Similarly.
+   (record_last_mem_set_info): Break out some code and put it into
+   gcse-common.c.  Call into the new common code.
+   (compute_local_properties): Pass additional arguments to compute_transp.
+   * postreload-gcse.c: Include gcse-common.h and df.h
+   (modify_mem_list_set, blocks_with_calls): New variables.
+   (modify_mem_list, canon_modify_mem_list, transp): Likewise.
+   (get_bb_avail_insn): Pass in the expression index too.
+   (alloc_mem): Allocate memory for the new bitmaps and lists.
+   (free_mem): Free memory for the new bitmaps and lists.
+   (insert_expr_in_table): Record a bitmap index for each entry we
+   add to the table.
+   (record_last_mem_set_info): Call into common code in gcse-common.c.
+   (get_bb_avail_insn): If no available insn was found in the requested
+   BB.  If BB has a single predecessor, see if the expression is
+   transparent in BB and available in that single predecessor.
+   (compute_expr_transp): New wrapper for compute_transp.
+   (eliminate_partially_redundant_load): Pass expression's bitmap_index
+   to get_bb_avail_insn.  Compute next_pred_bb_end a bit later.
+