Re: Keep static VTA locs in cselib tables only
Richard Sandiford rdsandif...@googlemail.com writes: Alexandre Oliva aol...@redhat.com writes: On Nov 25, 2011, Jakub Jelinek ja...@redhat.com wrote: The numbers I got with your patch (RTL checking) are below, seems the cumulative numbers other than 100% are all bigger with patched stage2, which means unfortunately debug info quality degradation. Not really. I found some actual degradation after finally getting back to it. In some cases, I failed to reset NO_LOC_P, and this caused expressions that depended on it to resort to alternate values or end up unset. In other cases, we created different cselib values for debug temps and implicit ptrs, and merging them at dataflow confluences no longer found a common value because the common value was in cselib's static equivalence table. I've fixed (and added an assertion to catch) left-over NO_LOC_Ps, and arranged for values created for debug exprs, implicit ptr, entry values and parameter refs to be preserved across basic blocks as constants within cselib. With that, the debug info we get is a strict improvement in terms of coverage, even though a bunch of .o files still display a decrease in 100% coverage. In the handful files I examined, the patched compiler was emitting a loc list without full coverage, while the original compiler was emitting a single loc expr, that implicitly got full coverage even though AFAICT it should really cover a narrower range. Full coverage was a false positive, and less-than-100% coverage in these cases is not a degradation, but rather an improvement. Now, the reason why we emit additional expressions now is that the new algorithm is more prone to emitting different (and better) expressions when entering basic block, because we don't try as hard as before to keep on with the same location expression. Instead we recompute all the potentially-changed expressions, which will tend to select better expressions if available. Otherwise the patch looks good to me. Thanks. After the updated comparison data below, you can find the patch I'm checking in, followed by the small interdiff from the previous patch. Happy GNU Year! :-) The results below can be reproduced with r182723. stage1 sources are patched, stage2 and stage3 aren't, so stage2 is built with a patched compiler, stage3 isn't. $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.ev 100784 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.ev 102406 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.ev 33275 obj-i686-linux-gnu/stage2-gcc/cc1plus.ev 33944 obj-i686-linux-gnu/stage3-gcc/cc1plus.ev $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.csv 523647 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.csv 523536 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.csv 521276 obj-i686-linux-gnu/stage2-gcc/cc1plus.csv 521907 obj-i686-linux-gnu/stage3-gcc/cc1plus.csv $ diff -yW80 obj-x86_64-linux-gnu/stage[23]-gcc/cc1plus.ls cov%samples cumul cov%samples cumul 0.0 150949/30% 150949/30%| 0.0 150980/30% 150980/30% 0..56234/1% 157183/31%| 0..56254/1% 157234/31% 6..10 5630/1% 162813/32%| 6..10 5641/1% 162875/32% 11..15 4675/0% 167488/33%| 11..15 4703/0% 167578/33% 16..20 5041/1% 172529/34%| 16..20 5044/1% 172622/34% 21..25 5435/1% 177964/35%| 21..25 5466/1% 178088/35% 26..30 4249/0% 182213/36%| 26..30 4269/0% 182357/36% 31..35 4666/0% 186879/37%| 31..35 4674/0% 187031/37% 36..40 6939/1% 193818/38%| 36..40 6982/1% 194013/38% 41..45 7824/1% 201642/40%| 41..45 7859/1% 201872/40% 46..50 8538/1% 210180/42%| 46..50 8536/1% 210408/42% 51..55 7585/1% 217765/43%| 51..55 7611/1% 218019/43% 56..60 6088/1% 223853/44%| 56..60 6108/1% 224127/44% 61..65 5545/1% 229398/45%| 61..65 5574/1% 229701/46% 66..70 7151/1% 236549/47%| 66..70 7195/1% 236896/47% 71..75 8068/1% 244617/49%| 71..75 8104/1% 245000/49% 76..80 18852/3%263469/52%| 76..80 18879/3%263879/52% 81..85 11958/2%275427/55%| 81..85 11954/2%275833/55% 86..90 15201/3%290628/58%| 86..90 15145/3%290978/58% 91..95 16814/3%307442/61%| 91..95 16727/3%307705/61% 96..99 17121/3%324563/65%| 96..99 16991/3%324696/65% 100 174515/34% 499078/100% | 100 173994/34% 498690/100% $ diff -yW80 obj-i686-linux-gnu/stage[23]-gcc/cc1plus.ls cov%samples cumul cov%samples cumul 0.0 145453/27% 145453/27%| 0.0 145480/27% 145480/27% 0..56594/1% 152047/29%| 0..56603/1% 152083/29% 6..10 5664/1% 157711/30%| 6..10 5671/1% 157754/30% 11..15 4982/0% 162693/31%| 11..15 4997/0% 162751/31% 16..20 6155/1%
Re: Keep static VTA locs in cselib tables only
Alexandre Oliva aol...@redhat.com writes: On Nov 25, 2011, Jakub Jelinek ja...@redhat.com wrote: The numbers I got with your patch (RTL checking) are below, seems the cumulative numbers other than 100% are all bigger with patched stage2, which means unfortunately debug info quality degradation. Not really. I found some actual degradation after finally getting back to it. In some cases, I failed to reset NO_LOC_P, and this caused expressions that depended on it to resort to alternate values or end up unset. In other cases, we created different cselib values for debug temps and implicit ptrs, and merging them at dataflow confluences no longer found a common value because the common value was in cselib's static equivalence table. I've fixed (and added an assertion to catch) left-over NO_LOC_Ps, and arranged for values created for debug exprs, implicit ptr, entry values and parameter refs to be preserved across basic blocks as constants within cselib. With that, the debug info we get is a strict improvement in terms of coverage, even though a bunch of .o files still display a decrease in 100% coverage. In the handful files I examined, the patched compiler was emitting a loc list without full coverage, while the original compiler was emitting a single loc expr, that implicitly got full coverage even though AFAICT it should really cover a narrower range. Full coverage was a false positive, and less-than-100% coverage in these cases is not a degradation, but rather an improvement. Now, the reason why we emit additional expressions now is that the new algorithm is more prone to emitting different (and better) expressions when entering basic block, because we don't try as hard as before to keep on with the same location expression. Instead we recompute all the potentially-changed expressions, which will tend to select better expressions if available. Otherwise the patch looks good to me. Thanks. After the updated comparison data below, you can find the patch I'm checking in, followed by the small interdiff from the previous patch. Happy GNU Year! :-) The results below can be reproduced with r182723. stage1 sources are patched, stage2 and stage3 aren't, so stage2 is built with a patched compiler, stage3 isn't. $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.ev 100784 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.ev 102406 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.ev 33275 obj-i686-linux-gnu/stage2-gcc/cc1plus.ev 33944 obj-i686-linux-gnu/stage3-gcc/cc1plus.ev $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.csv 523647 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.csv 523536 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.csv 521276 obj-i686-linux-gnu/stage2-gcc/cc1plus.csv 521907 obj-i686-linux-gnu/stage3-gcc/cc1plus.csv $ diff -yW80 obj-x86_64-linux-gnu/stage[23]-gcc/cc1plus.ls cov%samples cumul cov%samples cumul 0.0 150949/30% 150949/30%| 0.0 150980/30% 150980/30% 0..56234/1% 157183/31%| 0..56254/1% 157234/31% 6..10 5630/1% 162813/32%| 6..10 5641/1% 162875/32% 11..15 4675/0% 167488/33%| 11..15 4703/0% 167578/33% 16..20 5041/1% 172529/34%| 16..20 5044/1% 172622/34% 21..25 5435/1% 177964/35%| 21..25 5466/1% 178088/35% 26..30 4249/0% 182213/36%| 26..30 4269/0% 182357/36% 31..35 4666/0% 186879/37%| 31..35 4674/0% 187031/37% 36..40 6939/1% 193818/38%| 36..40 6982/1% 194013/38% 41..45 7824/1% 201642/40%| 41..45 7859/1% 201872/40% 46..50 8538/1% 210180/42%| 46..50 8536/1% 210408/42% 51..55 7585/1% 217765/43%| 51..55 7611/1% 218019/43% 56..60 6088/1% 223853/44%| 56..60 6108/1% 224127/44% 61..65 5545/1% 229398/45%| 61..65 5574/1% 229701/46% 66..70 7151/1% 236549/47%| 66..70 7195/1% 236896/47% 71..75 8068/1% 244617/49%| 71..75 8104/1% 245000/49% 76..80 18852/3%263469/52%| 76..80 18879/3%263879/52% 81..85 11958/2%275427/55%| 81..85 11954/2%275833/55% 86..90 15201/3%290628/58%| 86..90 15145/3%290978/58% 91..95 16814/3%307442/61%| 91..95 16727/3%307705/61% 96..99 17121/3%324563/65%| 96..99 16991/3%324696/65% 100 174515/34% 499078/100% | 100 173994/34% 498690/100% $ diff -yW80 obj-i686-linux-gnu/stage[23]-gcc/cc1plus.ls cov%samples cumul cov%samples cumul 0.0 145453/27% 145453/27%| 0.0 145480/27% 145480/27% 0..56594/1% 152047/29%| 0..56603/1% 152083/29% 6..10 5664/1% 157711/30%| 6..10 5671/1% 157754/30% 11..15 4982/0% 162693/31%| 11..15 4997/0% 162751/31% 16..20 6155/1% 168848/32%| 16..20 6169/1% 168920/32% 21..25
Re: Keep static VTA locs in cselib tables only
On Jan 2, 2012, Hans-Peter Nilsson hans-peter.nils...@axis.com wrote: (canonical_cselib_val): New. * cselib.c (new_elt_loc_list): Rework to support value equivalences. Adjust all callers. This (r182760) caused regressions in the libstdc++ testsuite for cris-elf, PR51728. On Jan 2, 2012, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: this seem to have caused a bootstrap failure on s390x: PR51735 /home/andreas/git/gcc-head/gcc/tree-ssa-pre.c: In function ‘bool insert_aux(basic_block)’: /home/andreas/git/gcc-head/gcc/tree-ssa-pre.c:3791:1: internal compiler error: Segmentation fault Sorry about the breakage. Can you please confirm that these problems are both fixed with Jakub's patches? If not, could you please try the patch I've just posted in a follow up to the PR bootstrap/51725 thread, and let me know in case any problem remains? Thanks to Jakub for covering for me while I was network-challenged. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: Keep static VTA locs in cselib tables only
From: Alexandre Oliva aol...@redhat.com Date: Fri, 6 Jan 2012 20:35:39 +0100 On Jan 2, 2012, Hans-Peter Nilsson hans-peter.nils...@axis.com wrote: This (r182760) caused regressions in the libstdc++ testsuite for cris-elf, PR51728. On Jan 2, 2012, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: this seem to have caused a bootstrap failure on s390x: PR51735 Sorry about the breakage. Can you please confirm that these problems are both fixed with Jakub's patches? I can't add anything about Andreas' observation besides his own confirmation in the PR of it being fixed, but the problem I reported is fine now. No worries, if it wasn't fixed, I'd have mentioned it. ;) No testsuite result regressions (since T0=2007-01-05-16:47:21!) for cris-elf at r182959. brgds, H-P
Re: Keep static VTA locs in cselib tables only
Hi, this seem to have caused a bootstrap failure on s390x: PR51735 /home/andreas/git/gcc-head/gcc/tree-ssa-pre.c: In function ‘bool insert_aux(basic_block)’: /home/andreas/git/gcc-head/gcc/tree-ssa-pre.c:3791:1: internal compiler error: Segmentation fault Bye, -Andreas- On 11/23/2011 11:10 AM, Alexandre Oliva wrote: This patch reduces VTA memory consumption and even speeds it up somewhat, by avoiding recording permanent equivalences in dataflow sets (and propagating them all the way down the control flow graph), keeping them in cselib tables only. This saves some micro-operations, some duplicate attempts to expand the same complex operations, and most of all time and memory for locations in dataflow sets. I've also moved reverse operations and entry values, that are also permanent equivalences, to cselib tables, introducing a mechanism to add equivalences to cselib tables that doesn't depend on expressions hashing equal: instead, locs for values in an equivalence set are grouped in the loc list for the earliest (canonical) value in the set, in the cselib tables, with a single entry in the loc list for all other set members pointing to the canonical value. The downside is that we don't sort loc lists in cselib as we do in var-trackin, so we don't give expressions the same preferences we did before, which means there's some potential for debug info degradation, particularly for preferring entry value expressions over concrete expressions guaranteed to have an available value. I'm going to see whether sorting gets us better/faster results next, but just sorting them won't get us all the way: while before we'd sort all equivalences for the var-tracking-canonical equivalence, we may now fail to merge location lists because the static equivalences aren't taken into account when dynamic equivalence sets in var-tracking dataflow sets. I haven't thought about whether this makes much of a difference, or how to do that efficiently if desirable, but I figured I wouldn't wait any longer before submitting this patch for 4.7. This was regstrapped on i686-pc-linux-gnu and x86_64-linux-gnu. I've also run some debug info, memory and compile-time measurements: - compiling stage3-gcc/ (--enable-languages=all,ada) became some 1-2% faster on average (0.5% to 5% speedups were observed over 3 measurements) - comparable speedups with a not-very-random sample of preprocessed sources that used to be VTA bad-performers, with var-tracking memory use down by 10% to 50%. - compiling stage2 target libs and stage3 host patched sources (with both unpatched and patched stage2 compiler) produced cc1plus with 10% fewer entry value expressions (a welcome surprise!), 1% fewer call site value expressions, an increase of 0.1% in the total number of variables with location lists and less than 0.5% decrease in variables with full coverage. Here's the patch. Ok to install?
Re: Keep static VTA locs in cselib tables only
From: Alexandre Oliva aol...@redhat.com Date: Sat, 31 Dec 2011 20:57:24 +0100 * cselib.h (cselib_add_permanent_equiv): Declare. (canonical_cselib_val): New. * cselib.c (new_elt_loc_list): Rework to support value equivalences. Adjust all callers. (preserve_only_constants): Retain value equivalences. (references_value_p): Retain preserved values. (rtx_equal_for_cselib_1): Handle value equivalences. (cselib_invalidate_regno): Use canonical value. (cselib_add_permanent_equiv): New. * alias.c (find_base_term): Reset locs lists while recursing. * var-tracking.c (val_bind): New. Don't add equivalences present in cselib table, compared with code moved from... (val_store): ... here. (val_resolve): Use val_bind. (VAL_EXPR_HAS_REVERSE): Drop. (add_uses): Do not create MOps for addresses. Do not mark non-REG non-MEM expressions as requiring resolution. (reverse_op): Record reverse as a cselib equivalence. (add_stores): Use it. Do not create MOps for addresses. Do not require resolution for non-REG non-MEM expressions. Simplify support for reverse operations. (compute_bb_dataflow): Drop reverse support. (emit_notes_in_bb): Likewise. (create_entry_value): Rename to... (record_entry_value): ... this. Use cselib equivalences. (vt_add_function_parameter): Adjust. This (r182760) caused regressions in the libstdc++ testsuite for cris-elf, PR51728. brgds, H-P
Re: Keep static VTA locs in cselib tables only
On Nov 25, 2011, Jakub Jelinek ja...@redhat.com wrote: The numbers I got with your patch (RTL checking) are below, seems the cumulative numbers other than 100% are all bigger with patched stage2, which means unfortunately debug info quality degradation. Not really. I found some actual degradation after finally getting back to it. In some cases, I failed to reset NO_LOC_P, and this caused expressions that depended on it to resort to alternate values or end up unset. In other cases, we created different cselib values for debug temps and implicit ptrs, and merging them at dataflow confluences no longer found a common value because the common value was in cselib's static equivalence table. I've fixed (and added an assertion to catch) left-over NO_LOC_Ps, and arranged for values created for debug exprs, implicit ptr, entry values and parameter refs to be preserved across basic blocks as constants within cselib. With that, the debug info we get is a strict improvement in terms of coverage, even though a bunch of .o files still display a decrease in 100% coverage. In the handful files I examined, the patched compiler was emitting a loc list without full coverage, while the original compiler was emitting a single loc expr, that implicitly got full coverage even though AFAICT it should really cover a narrower range. Full coverage was a false positive, and less-than-100% coverage in these cases is not a degradation, but rather an improvement. Now, the reason why we emit additional expressions now is that the new algorithm is more prone to emitting different (and better) expressions when entering basic block, because we don't try as hard as before to keep on with the same location expression. Instead we recompute all the potentially-changed expressions, which will tend to select better expressions if available. Otherwise the patch looks good to me. Thanks. After the updated comparison data below, you can find the patch I'm checking in, followed by the small interdiff from the previous patch. Happy GNU Year! :-) The results below can be reproduced with r182723. stage1 sources are patched, stage2 and stage3 aren't, so stage2 is built with a patched compiler, stage3 isn't. $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.ev 100784 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.ev 102406 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.ev 33275 obj-i686-linux-gnu/stage2-gcc/cc1plus.ev 33944 obj-i686-linux-gnu/stage3-gcc/cc1plus.ev $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.csv 523647 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.csv 523536 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.csv 521276 obj-i686-linux-gnu/stage2-gcc/cc1plus.csv 521907 obj-i686-linux-gnu/stage3-gcc/cc1plus.csv $ diff -yW80 obj-x86_64-linux-gnu/stage[23]-gcc/cc1plus.ls cov%samples cumul cov%samples cumul 0.0 150949/30% 150949/30%| 0.0 150980/30% 150980/30% 0..56234/1% 157183/31%| 0..56254/1% 157234/31% 6..10 5630/1% 162813/32%| 6..10 5641/1% 162875/32% 11..15 4675/0% 167488/33%| 11..15 4703/0% 167578/33% 16..20 5041/1% 172529/34%| 16..20 5044/1% 172622/34% 21..25 5435/1% 177964/35%| 21..25 5466/1% 178088/35% 26..30 4249/0% 182213/36%| 26..30 4269/0% 182357/36% 31..35 4666/0% 186879/37%| 31..35 4674/0% 187031/37% 36..40 6939/1% 193818/38%| 36..40 6982/1% 194013/38% 41..45 7824/1% 201642/40%| 41..45 7859/1% 201872/40% 46..50 8538/1% 210180/42%| 46..50 8536/1% 210408/42% 51..55 7585/1% 217765/43%| 51..55 7611/1% 218019/43% 56..60 6088/1% 223853/44%| 56..60 6108/1% 224127/44% 61..65 5545/1% 229398/45%| 61..65 5574/1% 229701/46% 66..70 7151/1% 236549/47%| 66..70 7195/1% 236896/47% 71..75 8068/1% 244617/49%| 71..75 8104/1% 245000/49% 76..80 18852/3%263469/52%| 76..80 18879/3%263879/52% 81..85 11958/2%275427/55%| 81..85 11954/2%275833/55% 86..90 15201/3%290628/58%| 86..90 15145/3%290978/58% 91..95 16814/3%307442/61%| 91..95 16727/3%307705/61% 96..99 17121/3%324563/65%| 96..99 16991/3%324696/65% 100 174515/34% 499078/100% | 100 173994/34% 498690/100% $ diff -yW80 obj-i686-linux-gnu/stage[23]-gcc/cc1plus.ls cov%samples cumul cov%samples cumul 0.0 145453/27% 145453/27%| 0.0 145480/27% 145480/27% 0..56594/1% 152047/29%| 0..56603/1% 152083/29% 6..10 5664/1% 157711/30%| 6..10 5671/1% 157754/30% 11..15 4982/0% 162693/31%| 11..15 4997/0% 162751/31% 16..20 6155/1% 168848/32%| 16..20 6169/1% 168920/32% 21..25 5038/0% 173886/33%| 21..25 5057/0% 173977/33% 26..30 4925/0% 178811/34%| 26..30
Re: Keep static VTA locs in cselib tables only
On Wed, Nov 23, 2011 at 08:10:00AM -0200, Alexandre Oliva wrote: - compiling stage2 target libs and stage3 host patched sources (with both unpatched and patched stage2 compiler) produced cc1plus with 10% fewer entry value expressions (a welcome surprise!), 1% fewer call site value expressions, an increase of 0.1% in the total number of variables with location lists and less than 0.5% decrease in variables with full coverage. The numbers I got with your patch (RTL checking) are below, seems the cumulative numbers other than 100% are all bigger with patched stage2, which means unfortunately debug info quality degradation. Have you analysed at least on some shorter testcases why does that happen? Otherwise the patch looks good to me. x86_64 patched stage3 compiled by vanilla stage2 cov%samples cumul 0.0 230172/32% 230172/32% 0..10 12267/1%242439/34% 11..20 10548/1%252987/35% 21..30 17018/2%270005/37% 31..40 16374/2%286379/40% 41..50 17533/2%303912/42% 51..60 13051/1%316963/44% 61..70 13946/1%330909/46% 71..80 19627/2%350536/49% 81..90 28877/4%379413/53% 91..99 85086/11% 464499/65% 100 246568/34% 711067/100% x86_64 patched stage3 compiled by patched stage2 cov%samples cumul 0.0 230182/32% 230182/32% 0..10 12319/1%242501/34% 11..20 10765/1%253266/35% 21..30 17390/2%270656/38% 31..40 16745/2%287401/40% 41..50 17821/2%305222/42% 51..60 13306/1%318528/44% 61..70 14104/1%332632/46% 71..80 19795/2%352427/49% 81..90 29030/4%381457/53% 91..99 85171/11% 466628/65% 100 244439/34% 711067/100% i686 patched stage3 compiled by vanilla stage2 cov%samples cumul 0.0 225909/32% 225909/32% 0..10 12420/1%238329/34% 11..20 10693/1%249022/35% 21..30 17102/2%266124/38% 31..40 13529/1%279653/40% 41..50 17232/2%296885/42% 51..60 12568/1%309453/44% 61..70 14769/2%324222/46% 71..80 14937/2%339159/48% 81..90 23868/3%363027/52% 91..99 86306/12% 449333/64% 100 245327/35% 694660/100% i686 patched stage3 compiled by patched stage2 cov%samples cumul 0.0 225917/32% 225917/32% 0..10 12471/1%238388/34% 11..20 10848/1%249236/35% 21..30 17292/2%266528/38% 31..40 13716/1%280244/40% 41..50 17324/2%297568/42% 51..60 12673/1%310241/44% 61..70 14950/2%325191/46% 71..80 15085/2%340276/48% 81..90 24019/3%364295/52% 91..99 86228/12% 450523/64% 100 244137/35% 694660/100% Jakub
Keep static VTA locs in cselib tables only
This patch reduces VTA memory consumption and even speeds it up somewhat, by avoiding recording permanent equivalences in dataflow sets (and propagating them all the way down the control flow graph), keeping them in cselib tables only. This saves some micro-operations, some duplicate attempts to expand the same complex operations, and most of all time and memory for locations in dataflow sets. I've also moved reverse operations and entry values, that are also permanent equivalences, to cselib tables, introducing a mechanism to add equivalences to cselib tables that doesn't depend on expressions hashing equal: instead, locs for values in an equivalence set are grouped in the loc list for the earliest (canonical) value in the set, in the cselib tables, with a single entry in the loc list for all other set members pointing to the canonical value. The downside is that we don't sort loc lists in cselib as we do in var-trackin, so we don't give expressions the same preferences we did before, which means there's some potential for debug info degradation, particularly for preferring entry value expressions over concrete expressions guaranteed to have an available value. I'm going to see whether sorting gets us better/faster results next, but just sorting them won't get us all the way: while before we'd sort all equivalences for the var-tracking-canonical equivalence, we may now fail to merge location lists because the static equivalences aren't taken into account when dynamic equivalence sets in var-tracking dataflow sets. I haven't thought about whether this makes much of a difference, or how to do that efficiently if desirable, but I figured I wouldn't wait any longer before submitting this patch for 4.7. This was regstrapped on i686-pc-linux-gnu and x86_64-linux-gnu. I've also run some debug info, memory and compile-time measurements: - compiling stage3-gcc/ (--enable-languages=all,ada) became some 1-2% faster on average (0.5% to 5% speedups were observed over 3 measurements) - comparable speedups with a not-very-random sample of preprocessed sources that used to be VTA bad-performers, with var-tracking memory use down by 10% to 50%. - compiling stage2 target libs and stage3 host patched sources (with both unpatched and patched stage2 compiler) produced cc1plus with 10% fewer entry value expressions (a welcome surprise!), 1% fewer call site value expressions, an increase of 0.1% in the total number of variables with location lists and less than 0.5% decrease in variables with full coverage. Here's the patch. Ok to install? for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com * cselib.h (cselib_add_permanent_equiv): Declare. (canonical_cselib_val): New. * cselib.c (new_elt_loc_list): Rework to support value equivalences. Adjust all callers. (preserve_only_constants): Retain value equivalences. (references_value_p): Retain preserved values. (rtx_equal_for_cselib_1): Handle value equivalences. (cselib_invalidate_regno): Use canonical value. (cselib_add_permanent_equiv): New. * alias.c (find_base_term): Reset locs lists while recursing. * var-tracking.c (val_bind): New. Don't add equivalences present in cselib table, compared with code moved from... (val_store): ... here. (val_resolve): Use val_bind. (VAL_EXPR_HAS_REVERSE): Drop. (add_uses): Do not create MOps for addresses. Do not mark non-REG non-MEM expressions as requiring resolution. (reverse_op): Record reverse as a cselib equivalence. (add_stores): Use it. Do not create MOps for addresses. Do not require resolution for non-REG non-MEM expressions. Simplify support for reverse operations. (compute_bb_dataflow): Drop reverse support. (emit_notes_in_bb): Likewise. (create_entry_value): Rename to... (record_entry_value): ... this. Use cselib equivalences. (vt_add_function_parameter): Adjust. Index: gcc/cselib.h === --- gcc/cselib.h.orig 2011-11-21 02:23:54.111708716 -0200 +++ gcc/cselib.h 2011-11-21 05:31:42.176203099 -0200 @@ -96,5 +96,24 @@ extern void cselib_preserve_value (cseli extern bool cselib_preserved_value_p (cselib_val *); extern void cselib_preserve_only_values (void); extern void cselib_preserve_cfa_base_value (cselib_val *, unsigned int); +extern void cselib_add_permanent_equiv (cselib_val *, rtx, rtx); extern void dump_cselib_table (FILE *); + +/* Return the canonical value for VAL, following the equivalence chain + towards the earliest (== lowest uid) equivalent value. */ + +static inline cselib_val * +canonical_cselib_val (cselib_val *val) +{ + cselib_val *canon; + + if (!val-locs || val-locs-next + || !val-locs-loc || GET_CODE (val-locs-loc) != VALUE + || val-uid CSELIB_VAL_PTR (val-locs-loc)-uid) +return val; + + canon = CSELIB_VAL_PTR (val-locs-loc); + gcc_checking_assert (canonical_cselib_val (canon) == canon); + return canon; +} Index: gcc/cselib.c