Re: Keep static VTA locs in cselib tables only

2012-01-25 Thread Richard Sandiford
Richard Sandiford rdsandif...@googlemail.com writes:
 Alexandre Oliva aol...@redhat.com writes:
 On Nov 25, 2011, Jakub Jelinek ja...@redhat.com wrote:
 The numbers I got with your patch (RTL checking) are below, seems
 the cumulative numbers other than 100% are all bigger with patched stage2,
 which means unfortunately debug info quality degradation.

 Not really.  I found some actual degradation after finally getting back
 to it.  In some cases, I failed to reset NO_LOC_P, and this caused
 expressions that depended on it to resort to alternate values or end up
 unset.  In other cases, we created different cselib values for debug
 temps and implicit ptrs, and merging them at dataflow confluences no
 longer found a common value because the common value was in cselib's
 static equivalence table.  I've fixed (and added an assertion to catch)
 left-over NO_LOC_Ps, and arranged for values created for debug exprs,
 implicit ptr, entry values and parameter refs to be preserved across
 basic blocks as constants within cselib.

 With that, the debug info we get is a strict improvement in terms of
 coverage, even though a bunch of .o files still display a decrease in
 100% coverage.  In the handful files I examined, the patched compiler
 was emitting a loc list without full coverage, while the original
 compiler was emitting a single loc expr, that implicitly got full
 coverage even though AFAICT it should really cover a narrower range.
 Full coverage was a false positive, and less-than-100% coverage in these
 cases is not a degradation, but rather an improvement.

 Now, the reason why we emit additional expressions now is that the new
 algorithm is more prone to emitting different (and better) expressions
 when entering basic block, because we don't try as hard as before to
 keep on with the same location expression.  Instead we recompute all the
 potentially-changed expressions, which will tend to select better
 expressions if available.

 Otherwise the patch looks good to me.

 Thanks.  After the updated comparison data below, you can find the patch
 I'm checking in, followed by the small interdiff from the previous
 patch.


 Happy GNU Year! :-)


 The results below can be reproduced with r182723.

 stage1 sources are patched, stage2 and stage3 aren't, so
 stage2 is built with a patched compiler, stage3 isn't.

 $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.ev
   100784 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.ev
   102406 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.ev
33275 obj-i686-linux-gnu/stage2-gcc/cc1plus.ev
33944 obj-i686-linux-gnu/stage3-gcc/cc1plus.ev

 $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.csv
523647 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.csv
523536 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.csv
521276 obj-i686-linux-gnu/stage2-gcc/cc1plus.csv
521907 obj-i686-linux-gnu/stage3-gcc/cc1plus.csv

 $ diff -yW80 obj-x86_64-linux-gnu/stage[23]-gcc/cc1plus.ls
 cov%samples cumul   cov%samples cumul
 0.0 150949/30%  150949/30%| 0.0 150980/30%  150980/30%
 0..56234/1% 157183/31%| 0..56254/1% 157234/31%
 6..10   5630/1% 162813/32%| 6..10   5641/1% 162875/32%
 11..15  4675/0% 167488/33%| 11..15  4703/0% 167578/33%
 16..20  5041/1% 172529/34%| 16..20  5044/1% 172622/34%
 21..25  5435/1% 177964/35%| 21..25  5466/1% 178088/35%
 26..30  4249/0% 182213/36%| 26..30  4269/0% 182357/36%
 31..35  4666/0% 186879/37%| 31..35  4674/0% 187031/37%
 36..40  6939/1% 193818/38%| 36..40  6982/1% 194013/38%
 41..45  7824/1% 201642/40%| 41..45  7859/1% 201872/40%
 46..50  8538/1% 210180/42%| 46..50  8536/1% 210408/42%
 51..55  7585/1% 217765/43%| 51..55  7611/1% 218019/43%
 56..60  6088/1% 223853/44%| 56..60  6108/1% 224127/44%
 61..65  5545/1% 229398/45%| 61..65  5574/1% 229701/46%
 66..70  7151/1% 236549/47%| 66..70  7195/1% 236896/47%
 71..75  8068/1% 244617/49%| 71..75  8104/1% 245000/49%
 76..80  18852/3%263469/52%| 76..80  18879/3%263879/52%
 81..85  11958/2%275427/55%| 81..85  11954/2%275833/55%
 86..90  15201/3%290628/58%| 86..90  15145/3%290978/58%
 91..95  16814/3%307442/61%| 91..95  16727/3%307705/61%
 96..99  17121/3%324563/65%| 96..99  16991/3%324696/65%
 100 174515/34%  499078/100%   | 100 173994/34%  498690/100%

 $ diff -yW80 obj-i686-linux-gnu/stage[23]-gcc/cc1plus.ls
 cov%samples cumul   cov%samples cumul
 0.0 145453/27%  145453/27%| 0.0 145480/27%  145480/27%
 0..56594/1% 152047/29%| 0..56603/1% 152083/29%
 6..10   5664/1% 157711/30%| 6..10   5671/1% 157754/30%
 11..15  4982/0% 162693/31%| 11..15  4997/0% 162751/31%
 16..20  6155/1% 

Re: Keep static VTA locs in cselib tables only

2012-01-21 Thread Richard Sandiford
Alexandre Oliva aol...@redhat.com writes:
 On Nov 25, 2011, Jakub Jelinek ja...@redhat.com wrote:
 The numbers I got with your patch (RTL checking) are below, seems
 the cumulative numbers other than 100% are all bigger with patched stage2,
 which means unfortunately debug info quality degradation.

 Not really.  I found some actual degradation after finally getting back
 to it.  In some cases, I failed to reset NO_LOC_P, and this caused
 expressions that depended on it to resort to alternate values or end up
 unset.  In other cases, we created different cselib values for debug
 temps and implicit ptrs, and merging them at dataflow confluences no
 longer found a common value because the common value was in cselib's
 static equivalence table.  I've fixed (and added an assertion to catch)
 left-over NO_LOC_Ps, and arranged for values created for debug exprs,
 implicit ptr, entry values and parameter refs to be preserved across
 basic blocks as constants within cselib.

 With that, the debug info we get is a strict improvement in terms of
 coverage, even though a bunch of .o files still display a decrease in
 100% coverage.  In the handful files I examined, the patched compiler
 was emitting a loc list without full coverage, while the original
 compiler was emitting a single loc expr, that implicitly got full
 coverage even though AFAICT it should really cover a narrower range.
 Full coverage was a false positive, and less-than-100% coverage in these
 cases is not a degradation, but rather an improvement.

 Now, the reason why we emit additional expressions now is that the new
 algorithm is more prone to emitting different (and better) expressions
 when entering basic block, because we don't try as hard as before to
 keep on with the same location expression.  Instead we recompute all the
 potentially-changed expressions, which will tend to select better
 expressions if available.

 Otherwise the patch looks good to me.

 Thanks.  After the updated comparison data below, you can find the patch
 I'm checking in, followed by the small interdiff from the previous
 patch.


 Happy GNU Year! :-)


 The results below can be reproduced with r182723.

 stage1 sources are patched, stage2 and stage3 aren't, so
 stage2 is built with a patched compiler, stage3 isn't.

 $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.ev
   100784 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.ev
   102406 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.ev
33275 obj-i686-linux-gnu/stage2-gcc/cc1plus.ev
33944 obj-i686-linux-gnu/stage3-gcc/cc1plus.ev

 $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.csv
523647 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.csv
523536 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.csv
521276 obj-i686-linux-gnu/stage2-gcc/cc1plus.csv
521907 obj-i686-linux-gnu/stage3-gcc/cc1plus.csv

 $ diff -yW80 obj-x86_64-linux-gnu/stage[23]-gcc/cc1plus.ls
 cov%samples cumul   cov%samples cumul
 0.0 150949/30%  150949/30%| 0.0 150980/30%  150980/30%
 0..56234/1% 157183/31%| 0..56254/1% 157234/31%
 6..10   5630/1% 162813/32%| 6..10   5641/1% 162875/32%
 11..15  4675/0% 167488/33%| 11..15  4703/0% 167578/33%
 16..20  5041/1% 172529/34%| 16..20  5044/1% 172622/34%
 21..25  5435/1% 177964/35%| 21..25  5466/1% 178088/35%
 26..30  4249/0% 182213/36%| 26..30  4269/0% 182357/36%
 31..35  4666/0% 186879/37%| 31..35  4674/0% 187031/37%
 36..40  6939/1% 193818/38%| 36..40  6982/1% 194013/38%
 41..45  7824/1% 201642/40%| 41..45  7859/1% 201872/40%
 46..50  8538/1% 210180/42%| 46..50  8536/1% 210408/42%
 51..55  7585/1% 217765/43%| 51..55  7611/1% 218019/43%
 56..60  6088/1% 223853/44%| 56..60  6108/1% 224127/44%
 61..65  5545/1% 229398/45%| 61..65  5574/1% 229701/46%
 66..70  7151/1% 236549/47%| 66..70  7195/1% 236896/47%
 71..75  8068/1% 244617/49%| 71..75  8104/1% 245000/49%
 76..80  18852/3%263469/52%| 76..80  18879/3%263879/52%
 81..85  11958/2%275427/55%| 81..85  11954/2%275833/55%
 86..90  15201/3%290628/58%| 86..90  15145/3%290978/58%
 91..95  16814/3%307442/61%| 91..95  16727/3%307705/61%
 96..99  17121/3%324563/65%| 96..99  16991/3%324696/65%
 100 174515/34%  499078/100%   | 100 173994/34%  498690/100%

 $ diff -yW80 obj-i686-linux-gnu/stage[23]-gcc/cc1plus.ls
 cov%samples cumul   cov%samples cumul
 0.0 145453/27%  145453/27%| 0.0 145480/27%  145480/27%
 0..56594/1% 152047/29%| 0..56603/1% 152083/29%
 6..10   5664/1% 157711/30%| 6..10   5671/1% 157754/30%
 11..15  4982/0% 162693/31%| 11..15  4997/0% 162751/31%
 16..20  6155/1% 168848/32%| 16..20  6169/1% 168920/32%
 21..25 

Re: Keep static VTA locs in cselib tables only

2012-01-06 Thread Alexandre Oliva
On Jan  2, 2012, Hans-Peter Nilsson hans-peter.nils...@axis.com wrote:

 (canonical_cselib_val): New.
 * cselib.c (new_elt_loc_list): Rework to support value
 equivalences.  Adjust all callers.

 This (r182760) caused regressions in the libstdc++ testsuite for
 cris-elf, PR51728.

On Jan  2, 2012, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote:

 this seem to have caused a bootstrap failure on s390x: PR51735

 /home/andreas/git/gcc-head/gcc/tree-ssa-pre.c: In function ‘bool
 insert_aux(basic_block)’:
 /home/andreas/git/gcc-head/gcc/tree-ssa-pre.c:3791:1: internal compiler error:
 Segmentation fault

Sorry about the breakage.  Can you please confirm that these problems
are both fixed with Jakub's patches?  If not, could you please try the
patch I've just posted in a follow up to the PR bootstrap/51725 thread,
and let me know in case any problem remains?

Thanks to Jakub for covering for me while I was network-challenged.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


Re: Keep static VTA locs in cselib tables only

2012-01-06 Thread Hans-Peter Nilsson
 From: Alexandre Oliva aol...@redhat.com
 Date: Fri, 6 Jan 2012 20:35:39 +0100

 On Jan  2, 2012, Hans-Peter Nilsson hans-peter.nils...@axis.com wrote:
  This (r182760) caused regressions in the libstdc++ testsuite for
  cris-elf, PR51728.

 On Jan  2, 2012, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote:
  this seem to have caused a bootstrap failure on s390x: PR51735

 Sorry about the breakage.  Can you please confirm that these problems
 are both fixed with Jakub's patches?

I can't add anything about Andreas' observation besides his own
confirmation in the PR of it being fixed, but the problem I
reported is fine now.  No worries, if it wasn't fixed, I'd have
mentioned it. ;)

No testsuite result regressions (since T0=2007-01-05-16:47:21!)
for cris-elf at r182959.

brgds, H-P


Re: Keep static VTA locs in cselib tables only

2012-01-02 Thread Andreas Krebbel
Hi,

this seem to have caused a bootstrap failure on s390x: PR51735

/home/andreas/git/gcc-head/gcc/tree-ssa-pre.c: In function ‘bool
insert_aux(basic_block)’:
/home/andreas/git/gcc-head/gcc/tree-ssa-pre.c:3791:1: internal compiler error:
Segmentation fault

Bye,

-Andreas-

On 11/23/2011 11:10 AM, Alexandre Oliva wrote:
 This patch reduces VTA memory consumption and even speeds it up
 somewhat, by avoiding recording permanent equivalences in dataflow sets
 (and propagating them all the way down the control flow graph), keeping
 them in cselib tables only.  This saves some micro-operations, some
 duplicate attempts to expand the same complex operations, and most of
 all time and memory for locations in dataflow sets.
 
 I've also moved reverse operations and entry values, that are also
 permanent equivalences, to cselib tables, introducing a mechanism to add
 equivalences to cselib tables that doesn't depend on expressions hashing
 equal: instead, locs for values in an equivalence set are grouped in the
 loc list for the earliest (canonical) value in the set, in the cselib
 tables, with a single entry in the loc list for all other set members
 pointing to the canonical value.
 
 The downside is that we don't sort loc lists in cselib as we do in
 var-trackin, so we don't give expressions the same preferences we did
 before, which means there's some potential for debug info degradation,
 particularly for preferring entry value expressions over concrete
 expressions guaranteed to have an available value.  I'm going to see
 whether sorting gets us better/faster results next, but just sorting
 them won't get us all the way: while before we'd sort all equivalences
 for the var-tracking-canonical equivalence, we may now fail to merge
 location lists because the static equivalences aren't taken into account
 when dynamic equivalence sets in var-tracking dataflow sets.  I haven't
 thought about whether this makes much of a difference, or how to do that
 efficiently if desirable, but I figured I wouldn't wait any longer
 before submitting this patch for 4.7.
 
 This was regstrapped on i686-pc-linux-gnu and x86_64-linux-gnu.  I've
 also run some debug info, memory and compile-time measurements:
 
 - compiling stage3-gcc/ (--enable-languages=all,ada) became some 1-2%
 faster on average (0.5% to 5% speedups were observed over 3
 measurements)
 
 - comparable speedups with a not-very-random sample of preprocessed
 sources that used to be VTA bad-performers, with var-tracking memory use
 down by 10% to 50%.
 
 - compiling stage2 target libs and stage3 host patched sources (with
 both unpatched and patched stage2 compiler) produced cc1plus with 10%
 fewer entry value expressions (a welcome surprise!), 1% fewer call site
 value expressions, an increase of 0.1% in the total number of variables
 with location lists and less than 0.5% decrease in variables with full
 coverage.
 
 Here's the patch.  Ok to install?
 
 
 
 
 
 



Re: Keep static VTA locs in cselib tables only

2012-01-01 Thread Hans-Peter Nilsson
 From: Alexandre Oliva aol...@redhat.com
 Date: Sat, 31 Dec 2011 20:57:24 +0100

 * cselib.h (cselib_add_permanent_equiv): Declare.
 (canonical_cselib_val): New.
 * cselib.c (new_elt_loc_list): Rework to support value
 equivalences.  Adjust all callers.
 (preserve_only_constants): Retain value equivalences.
 (references_value_p): Retain preserved values.
 (rtx_equal_for_cselib_1): Handle value equivalences.
 (cselib_invalidate_regno): Use canonical value.
 (cselib_add_permanent_equiv): New.
 * alias.c (find_base_term): Reset locs lists while recursing.
 * var-tracking.c (val_bind): New.  Don't add equivalences
 present in cselib table, compared with code moved from...
 (val_store): ... here.
 (val_resolve): Use val_bind.
 (VAL_EXPR_HAS_REVERSE): Drop.
 (add_uses): Do not create MOps for addresses.  Do not mark
 non-REG non-MEM expressions as requiring resolution.
 (reverse_op): Record reverse as a cselib equivalence.
 (add_stores): Use it.  Do not create MOps for addresses.
 Do not require resolution for non-REG non-MEM expressions.
 Simplify support for reverse operations.
 (compute_bb_dataflow): Drop reverse support.
 (emit_notes_in_bb): Likewise.
 (create_entry_value): Rename to...
 (record_entry_value): ... this.  Use cselib equivalences.
 (vt_add_function_parameter): Adjust.

This (r182760) caused regressions in the libstdc++ testsuite for
cris-elf, PR51728.

brgds, H-P


Re: Keep static VTA locs in cselib tables only

2011-12-31 Thread Alexandre Oliva
On Nov 25, 2011, Jakub Jelinek ja...@redhat.com wrote:

 The numbers I got with your patch (RTL checking) are below, seems
 the cumulative numbers other than 100% are all bigger with patched stage2,
 which means unfortunately debug info quality degradation.

Not really.  I found some actual degradation after finally getting back
to it.  In some cases, I failed to reset NO_LOC_P, and this caused
expressions that depended on it to resort to alternate values or end up
unset.  In other cases, we created different cselib values for debug
temps and implicit ptrs, and merging them at dataflow confluences no
longer found a common value because the common value was in cselib's
static equivalence table.  I've fixed (and added an assertion to catch)
left-over NO_LOC_Ps, and arranged for values created for debug exprs,
implicit ptr, entry values and parameter refs to be preserved across
basic blocks as constants within cselib.

With that, the debug info we get is a strict improvement in terms of
coverage, even though a bunch of .o files still display a decrease in
100% coverage.  In the handful files I examined, the patched compiler
was emitting a loc list without full coverage, while the original
compiler was emitting a single loc expr, that implicitly got full
coverage even though AFAICT it should really cover a narrower range.
Full coverage was a false positive, and less-than-100% coverage in these
cases is not a degradation, but rather an improvement.

Now, the reason why we emit additional expressions now is that the new
algorithm is more prone to emitting different (and better) expressions
when entering basic block, because we don't try as hard as before to
keep on with the same location expression.  Instead we recompute all the
potentially-changed expressions, which will tend to select better
expressions if available.

 Otherwise the patch looks good to me.

Thanks.  After the updated comparison data below, you can find the patch
I'm checking in, followed by the small interdiff from the previous
patch.


Happy GNU Year! :-)


The results below can be reproduced with r182723.

stage1 sources are patched, stage2 and stage3 aren't, so
stage2 is built with a patched compiler, stage3 isn't.

$ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.ev
  100784 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.ev
  102406 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.ev
   33275 obj-i686-linux-gnu/stage2-gcc/cc1plus.ev
   33944 obj-i686-linux-gnu/stage3-gcc/cc1plus.ev

$ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.csv
   523647 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.csv
   523536 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.csv
   521276 obj-i686-linux-gnu/stage2-gcc/cc1plus.csv
   521907 obj-i686-linux-gnu/stage3-gcc/cc1plus.csv

$ diff -yW80 obj-x86_64-linux-gnu/stage[23]-gcc/cc1plus.ls
cov%samples cumul   cov%samples cumul
0.0 150949/30%  150949/30%| 0.0 150980/30%  150980/30%
0..56234/1% 157183/31%| 0..56254/1% 157234/31%
6..10   5630/1% 162813/32%| 6..10   5641/1% 162875/32%
11..15  4675/0% 167488/33%| 11..15  4703/0% 167578/33%
16..20  5041/1% 172529/34%| 16..20  5044/1% 172622/34%
21..25  5435/1% 177964/35%| 21..25  5466/1% 178088/35%
26..30  4249/0% 182213/36%| 26..30  4269/0% 182357/36%
31..35  4666/0% 186879/37%| 31..35  4674/0% 187031/37%
36..40  6939/1% 193818/38%| 36..40  6982/1% 194013/38%
41..45  7824/1% 201642/40%| 41..45  7859/1% 201872/40%
46..50  8538/1% 210180/42%| 46..50  8536/1% 210408/42%
51..55  7585/1% 217765/43%| 51..55  7611/1% 218019/43%
56..60  6088/1% 223853/44%| 56..60  6108/1% 224127/44%
61..65  5545/1% 229398/45%| 61..65  5574/1% 229701/46%
66..70  7151/1% 236549/47%| 66..70  7195/1% 236896/47%
71..75  8068/1% 244617/49%| 71..75  8104/1% 245000/49%
76..80  18852/3%263469/52%| 76..80  18879/3%263879/52%
81..85  11958/2%275427/55%| 81..85  11954/2%275833/55%
86..90  15201/3%290628/58%| 86..90  15145/3%290978/58%
91..95  16814/3%307442/61%| 91..95  16727/3%307705/61%
96..99  17121/3%324563/65%| 96..99  16991/3%324696/65%
100 174515/34%  499078/100%   | 100 173994/34%  498690/100%

$ diff -yW80 obj-i686-linux-gnu/stage[23]-gcc/cc1plus.ls
cov%samples cumul   cov%samples cumul
0.0 145453/27%  145453/27%| 0.0 145480/27%  145480/27%
0..56594/1% 152047/29%| 0..56603/1% 152083/29%
6..10   5664/1% 157711/30%| 6..10   5671/1% 157754/30%
11..15  4982/0% 162693/31%| 11..15  4997/0% 162751/31%
16..20  6155/1% 168848/32%| 16..20  6169/1% 168920/32%
21..25  5038/0% 173886/33%| 21..25  5057/0% 173977/33%
26..30  4925/0% 178811/34%| 26..30  

Re: Keep static VTA locs in cselib tables only

2011-11-25 Thread Jakub Jelinek
On Wed, Nov 23, 2011 at 08:10:00AM -0200, Alexandre Oliva wrote:
 - compiling stage2 target libs and stage3 host patched sources (with
 both unpatched and patched stage2 compiler) produced cc1plus with 10%
 fewer entry value expressions (a welcome surprise!), 1% fewer call site
 value expressions, an increase of 0.1% in the total number of variables
 with location lists and less than 0.5% decrease in variables with full
 coverage.

The numbers I got with your patch (RTL checking) are below, seems
the cumulative numbers other than 100% are all bigger with patched stage2,
which means unfortunately debug info quality degradation.  Have you
analysed at least on some shorter testcases why does that happen?

Otherwise the patch looks good to me.

x86_64 patched stage3 compiled by vanilla stage2
cov%samples cumul
0.0 230172/32%  230172/32%
0..10   12267/1%242439/34%
11..20  10548/1%252987/35%
21..30  17018/2%270005/37%
31..40  16374/2%286379/40%
41..50  17533/2%303912/42%
51..60  13051/1%316963/44%
61..70  13946/1%330909/46%
71..80  19627/2%350536/49%
81..90  28877/4%379413/53%
91..99  85086/11%   464499/65%
100 246568/34%  711067/100%
x86_64 patched stage3 compiled by patched stage2
cov%samples cumul
0.0 230182/32%  230182/32%
0..10   12319/1%242501/34%
11..20  10765/1%253266/35%
21..30  17390/2%270656/38%
31..40  16745/2%287401/40%
41..50  17821/2%305222/42%
51..60  13306/1%318528/44%
61..70  14104/1%332632/46%
71..80  19795/2%352427/49%
81..90  29030/4%381457/53%
91..99  85171/11%   466628/65%
100 244439/34%  711067/100%
i686 patched stage3 compiled by vanilla stage2
cov%samples cumul
0.0 225909/32%  225909/32%
0..10   12420/1%238329/34%
11..20  10693/1%249022/35%
21..30  17102/2%266124/38%
31..40  13529/1%279653/40%
41..50  17232/2%296885/42%
51..60  12568/1%309453/44%
61..70  14769/2%324222/46%
71..80  14937/2%339159/48%
81..90  23868/3%363027/52%
91..99  86306/12%   449333/64%
100 245327/35%  694660/100%
i686 patched stage3 compiled by patched stage2
cov%samples cumul
0.0 225917/32%  225917/32%
0..10   12471/1%238388/34%
11..20  10848/1%249236/35%
21..30  17292/2%266528/38%
31..40  13716/1%280244/40%
41..50  17324/2%297568/42%
51..60  12673/1%310241/44%
61..70  14950/2%325191/46%
71..80  15085/2%340276/48%
81..90  24019/3%364295/52%
91..99  86228/12%   450523/64%
100 244137/35%  694660/100%

Jakub


Keep static VTA locs in cselib tables only

2011-11-23 Thread Alexandre Oliva
This patch reduces VTA memory consumption and even speeds it up
somewhat, by avoiding recording permanent equivalences in dataflow sets
(and propagating them all the way down the control flow graph), keeping
them in cselib tables only.  This saves some micro-operations, some
duplicate attempts to expand the same complex operations, and most of
all time and memory for locations in dataflow sets.

I've also moved reverse operations and entry values, that are also
permanent equivalences, to cselib tables, introducing a mechanism to add
equivalences to cselib tables that doesn't depend on expressions hashing
equal: instead, locs for values in an equivalence set are grouped in the
loc list for the earliest (canonical) value in the set, in the cselib
tables, with a single entry in the loc list for all other set members
pointing to the canonical value.

The downside is that we don't sort loc lists in cselib as we do in
var-trackin, so we don't give expressions the same preferences we did
before, which means there's some potential for debug info degradation,
particularly for preferring entry value expressions over concrete
expressions guaranteed to have an available value.  I'm going to see
whether sorting gets us better/faster results next, but just sorting
them won't get us all the way: while before we'd sort all equivalences
for the var-tracking-canonical equivalence, we may now fail to merge
location lists because the static equivalences aren't taken into account
when dynamic equivalence sets in var-tracking dataflow sets.  I haven't
thought about whether this makes much of a difference, or how to do that
efficiently if desirable, but I figured I wouldn't wait any longer
before submitting this patch for 4.7.

This was regstrapped on i686-pc-linux-gnu and x86_64-linux-gnu.  I've
also run some debug info, memory and compile-time measurements:

- compiling stage3-gcc/ (--enable-languages=all,ada) became some 1-2%
faster on average (0.5% to 5% speedups were observed over 3
measurements)

- comparable speedups with a not-very-random sample of preprocessed
sources that used to be VTA bad-performers, with var-tracking memory use
down by 10% to 50%.

- compiling stage2 target libs and stage3 host patched sources (with
both unpatched and patched stage2 compiler) produced cc1plus with 10%
fewer entry value expressions (a welcome surprise!), 1% fewer call site
value expressions, an increase of 0.1% in the total number of variables
with location lists and less than 0.5% decrease in variables with full
coverage.

Here's the patch.  Ok to install?

for  gcc/ChangeLog
from  Alexandre Oliva  aol...@redhat.com

	* cselib.h (cselib_add_permanent_equiv): Declare.
	(canonical_cselib_val): New.
	* cselib.c (new_elt_loc_list): Rework to support value
	equivalences.  Adjust all callers.
	(preserve_only_constants): Retain value equivalences.
	(references_value_p): Retain preserved values.
	(rtx_equal_for_cselib_1): Handle value equivalences.
	(cselib_invalidate_regno): Use canonical value.
	(cselib_add_permanent_equiv): New.
	* alias.c (find_base_term): Reset locs lists while recursing.
	* var-tracking.c (val_bind): New.  Don't add equivalences
	present in cselib table, compared with code moved from...
	(val_store): ... here.
	(val_resolve): Use val_bind.
	(VAL_EXPR_HAS_REVERSE): Drop.
	(add_uses): Do not create MOps for addresses.  Do not mark
	non-REG non-MEM expressions as requiring resolution.
	(reverse_op): Record reverse as a cselib equivalence.
	(add_stores): Use it.  Do not create MOps for addresses.
	Do not require resolution for non-REG non-MEM expressions.
	Simplify support for reverse operations.
	(compute_bb_dataflow): Drop reverse support.
	(emit_notes_in_bb): Likewise.
	(create_entry_value): Rename to...
	(record_entry_value): ... this.  Use cselib equivalences.
	(vt_add_function_parameter): Adjust.

Index: gcc/cselib.h
===
--- gcc/cselib.h.orig	2011-11-21 02:23:54.111708716 -0200
+++ gcc/cselib.h	2011-11-21 05:31:42.176203099 -0200
@@ -96,5 +96,24 @@ extern void cselib_preserve_value (cseli
 extern bool cselib_preserved_value_p (cselib_val *);
 extern void cselib_preserve_only_values (void);
 extern void cselib_preserve_cfa_base_value (cselib_val *, unsigned int);
+extern void cselib_add_permanent_equiv (cselib_val *, rtx, rtx);
 
 extern void dump_cselib_table (FILE *);
+
+/* Return the canonical value for VAL, following the equivalence chain
+   towards the earliest (== lowest uid) equivalent value.  */
+
+static inline cselib_val *
+canonical_cselib_val (cselib_val *val)
+{
+  cselib_val *canon;
+
+  if (!val-locs || val-locs-next
+  || !val-locs-loc || GET_CODE (val-locs-loc) != VALUE
+  || val-uid  CSELIB_VAL_PTR (val-locs-loc)-uid)
+return val;
+
+  canon = CSELIB_VAL_PTR (val-locs-loc);
+  gcc_checking_assert (canonical_cselib_val (canon) == canon);
+  return canon;
+}
Index: gcc/cselib.c