On February 1, 2021 8:34:35 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote: > > >On 1/28/21 1:09 AM, Richard Biener wrote: >> On Wed, 27 Jan 2021, Jakub Jelinek wrote: >> >>> On Wed, Jan 27, 2021 at 03:40:38PM +0100, Richard Biener wrote: >>>> The following avoids repeatedly turning VALUE RTXen into >>>> sth useful and re-applying a constant offset through get_addr >>>> via DSE check_mem_read_rtx. Instead perform this once for >>>> all stores to be visited in check_mem_read_rtx. This avoids >>>> allocating 1.6GB of garbage PLUS RTXen on the PR80960 >>>> testcase, fixing the memory usage regression from old GCC. >>>> >>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK? >>>> >>>> Thanks, >>>> Richard. >>>> >>>> 2021-01-27 Richard Biener <rguent...@suse.de> >>>> >>>> PR rtl-optimization/80960 >>>> * dse.c (check_mem_read_rtx): Call get_addr on the >>>> offsetted address. >>>> --- >>>> gcc/dse.c | 5 +++++ >>>> 1 file changed, 5 insertions(+) >>>> >>>> diff --git a/gcc/dse.c b/gcc/dse.c >>>> index c88587e7d94..da0df54a2dd 100644 >>>> --- a/gcc/dse.c >>>> +++ b/gcc/dse.c >>>> @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t >bb_info) >>>> } >>>> if (maybe_ne (offset, 0)) >>>> mem_addr = plus_constant (get_address_mode (mem), mem_addr, >offset); >>>> + /* Avoid passing VALUE RTXen as mem_addr to >canon_true_dependence >>>> + which will over and over re-create proper RTL and re-apply >the >>>> + offset above. See PR80960 where we almost allocate 1.6GB of >PLUS >>>> + RTXen that way. */ >>>> + mem_addr = get_addr (mem_addr); >>>> >>>> if (group_id >= 0) >>>> { >>> Does that result in any changes on how much does DSE optimize? >>> I mean, if you do 2 bootstraps/regtests, one with this patch and >another one >>> without it, and at the end of rest_of_handle_dse dump >>> locally_deleted, globally_deleted >>> for each CU/function, do you get the same counts except perhaps for >dse.c? >> So I see no difference for stage2-gcc/*.o dse1/dse2 with/without the >> patch but counts are _extremely_ small. Statistics: >> >> 70148 dse: local deletions = 0, global deletions = 0 >> 32 dse: local deletions = 0, global deletions = 1 >> 9 dse: local deletions = 0, global deletions = 2 >> 7 dse: local deletions = 0, global deletions = 3 >> 2 dse: local deletions = 0, global deletions = 4 >> 2 dse: local deletions = 0, global deletions = 5 >> 3 dse: local deletions = 0, global deletions = 7 >> 67 dse: local deletions = 1, global deletions = 0 >> 1 dse: local deletions = 1, global deletions = 2 >> 12 dse: local deletions = 2, global deletions = 0 >> 1 dse: local deletions = 24, global deletions = 1 >> 2 dse: local deletions = 3, global deletions = 0 >> 4 dse: local deletions = 4, global deletions = 0 >> 4 dse: local deletions = 6, global deletions = 0 >> 1 dse: local deletions = 7, global deletions = 0 >> 1 dse: local deletions = 8, global deletions = 0 >> >> so not sure how much confidence this brings over the analytical >> reasoning that it shouldn't make a difference ... >> >> stats on just dse2 are even more depressing (given it's cost) >> >> 35123 dse: local deletions = 0, global deletions = 0 >> 2 dse: local deletions = 0, global deletions = 1 >> 20 dse: local deletions = 1, global deletions = 0 >> 1 dse: local deletions = 2, global deletions = 0 >> 1 dse: local deletions = 3, global deletions = 0 >> 1 dse: local deletions = 4, global deletions = 0 >Based on that, I'd argue that DSE2 should go away and DSE1 should be >evaluated for the chopping block. While RTL DSE was marginally >important in 1999 when it was first submitted, the tree-ssa pipeline as >a whole has probably made RTL DSE largely pointless.
True. Though I'd argue that DSE2 might be the conceptually more useful pass since it sees spill slots. Richard. >Jeff