On 1/28/21 1:09 AM, Richard Biener wrote:
> On Wed, 27 Jan 2021, Jakub Jelinek wrote:
>
>> On Wed, Jan 27, 2021 at 03:40:38PM +0100, Richard Biener wrote:
>>> The following avoids repeatedly turning VALUE RTXen into
>>> sth useful and re-applying a constant offset through get_addr
>>> via DSE check_mem_read_rtx.  Instead perform this once for
>>> all stores to be visited in check_mem_read_rtx.  This avoids
>>> allocating 1.6GB of garbage PLUS RTXen on the PR80960
>>> testcase, fixing the memory usage regression from old GCC.
>>>
>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
>>>
>>> Thanks,
>>> Richard.
>>>
>>> 2021-01-27  Richard Biener  <rguent...@suse.de>
>>>
>>>     PR rtl-optimization/80960
>>>     * dse.c (check_mem_read_rtx): Call get_addr on the
>>>     offsetted address.
>>> ---
>>>  gcc/dse.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/gcc/dse.c b/gcc/dse.c
>>> index c88587e7d94..da0df54a2dd 100644
>>> --- a/gcc/dse.c
>>> +++ b/gcc/dse.c
>>> @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
>>>      }
>>>    if (maybe_ne (offset, 0))
>>>      mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
>>> +  /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence
>>> +     which will over and over re-create proper RTL and re-apply the
>>> +     offset above.  See PR80960 where we almost allocate 1.6GB of PLUS
>>> +     RTXen that way.  */
>>> +  mem_addr = get_addr (mem_addr);
>>>  
>>>    if (group_id >= 0)
>>>      {
>> Does that result in any changes on how much does DSE optimize?
>> I mean, if you do 2 bootstraps/regtests, one with this patch and another one
>> without it, and at the end of rest_of_handle_dse dump
>> locally_deleted, globally_deleted
>> for each CU/function, do you get the same counts except perhaps for dse.c?
> So I see no difference for stage2-gcc/*.o dse1/dse2 with/without the
> patch but counts are _extremely_ small.  Statistics:
>
>   70148 dse: local deletions = 0, global deletions = 0
>      32 dse: local deletions = 0, global deletions = 1
>       9 dse: local deletions = 0, global deletions = 2
>       7 dse: local deletions = 0, global deletions = 3
>       2 dse: local deletions = 0, global deletions = 4
>       2 dse: local deletions = 0, global deletions = 5
>       3 dse: local deletions = 0, global deletions = 7
>      67 dse: local deletions = 1, global deletions = 0
>       1 dse: local deletions = 1, global deletions = 2
>      12 dse: local deletions = 2, global deletions = 0
>       1 dse: local deletions = 24, global deletions = 1
>       2 dse: local deletions = 3, global deletions = 0
>       4 dse: local deletions = 4, global deletions = 0
>       4 dse: local deletions = 6, global deletions = 0
>       1 dse: local deletions = 7, global deletions = 0
>       1 dse: local deletions = 8, global deletions = 0
>
> so not sure how much confidence this brings over the analytical
> reasoning that it shouldn't make a difference ...
>
> stats on just dse2 are even more depressing (given it's cost)
>
>   35123 dse: local deletions = 0, global deletions = 0
>       2 dse: local deletions = 0, global deletions = 1
>      20 dse: local deletions = 1, global deletions = 0
>       1 dse: local deletions = 2, global deletions = 0
>       1 dse: local deletions = 3, global deletions = 0
>       1 dse: local deletions = 4, global deletions = 0
Based on that, I'd argue that DSE2 should go away and DSE1 should be
evaluated for the chopping block.  While RTL DSE was marginally
important in 1999 when it was first submitted, the tree-ssa pipeline as
a whole has probably made RTL DSE largely pointless.

Jeff

Reply via email to