On February 1, 2021 8:34:35 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote:
>
>
>On 1/28/21 1:09 AM, Richard Biener wrote:
>> On Wed, 27 Jan 2021, Jakub Jelinek wrote:
>>
>>> On Wed, Jan 27, 2021 at 03:40:38PM +0100, Richard Biener wrote:
>>>> The following avoids repeatedly turning VALUE RTXen into
>>>> sth useful and re-applying a constant offset through get_addr
>>>> via DSE check_mem_read_rtx.  Instead perform this once for
>>>> all stores to be visited in check_mem_read_rtx.  This avoids
>>>> allocating 1.6GB of garbage PLUS RTXen on the PR80960
>>>> testcase, fixing the memory usage regression from old GCC.
>>>>
>>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>> 2021-01-27  Richard Biener  <rguent...@suse.de>
>>>>
>>>>    PR rtl-optimization/80960
>>>>    * dse.c (check_mem_read_rtx): Call get_addr on the
>>>>    offsetted address.
>>>> ---
>>>>  gcc/dse.c | 5 +++++
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/gcc/dse.c b/gcc/dse.c
>>>> index c88587e7d94..da0df54a2dd 100644
>>>> --- a/gcc/dse.c
>>>> +++ b/gcc/dse.c
>>>> @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t
>bb_info)
>>>>      }
>>>>    if (maybe_ne (offset, 0))
>>>>      mem_addr = plus_constant (get_address_mode (mem), mem_addr,
>offset);
>>>> +  /* Avoid passing VALUE RTXen as mem_addr to
>canon_true_dependence
>>>> +     which will over and over re-create proper RTL and re-apply
>the
>>>> +     offset above.  See PR80960 where we almost allocate 1.6GB of
>PLUS
>>>> +     RTXen that way.  */
>>>> +  mem_addr = get_addr (mem_addr);
>>>>  
>>>>    if (group_id >= 0)
>>>>      {
>>> Does that result in any changes on how much does DSE optimize?
>>> I mean, if you do 2 bootstraps/regtests, one with this patch and
>another one
>>> without it, and at the end of rest_of_handle_dse dump
>>> locally_deleted, globally_deleted
>>> for each CU/function, do you get the same counts except perhaps for
>dse.c?
>> So I see no difference for stage2-gcc/*.o dse1/dse2 with/without the
>> patch but counts are _extremely_ small.  Statistics:
>>
>>   70148 dse: local deletions = 0, global deletions = 0
>>      32 dse: local deletions = 0, global deletions = 1
>>       9 dse: local deletions = 0, global deletions = 2
>>       7 dse: local deletions = 0, global deletions = 3
>>       2 dse: local deletions = 0, global deletions = 4
>>       2 dse: local deletions = 0, global deletions = 5
>>       3 dse: local deletions = 0, global deletions = 7
>>      67 dse: local deletions = 1, global deletions = 0
>>       1 dse: local deletions = 1, global deletions = 2
>>      12 dse: local deletions = 2, global deletions = 0
>>       1 dse: local deletions = 24, global deletions = 1
>>       2 dse: local deletions = 3, global deletions = 0
>>       4 dse: local deletions = 4, global deletions = 0
>>       4 dse: local deletions = 6, global deletions = 0
>>       1 dse: local deletions = 7, global deletions = 0
>>       1 dse: local deletions = 8, global deletions = 0
>>
>> so not sure how much confidence this brings over the analytical
>> reasoning that it shouldn't make a difference ...
>>
>> stats on just dse2 are even more depressing (given it's cost)
>>
>>   35123 dse: local deletions = 0, global deletions = 0
>>       2 dse: local deletions = 0, global deletions = 1
>>      20 dse: local deletions = 1, global deletions = 0
>>       1 dse: local deletions = 2, global deletions = 0
>>       1 dse: local deletions = 3, global deletions = 0
>>       1 dse: local deletions = 4, global deletions = 0
>Based on that, I'd argue that DSE2 should go away and DSE1 should be
>evaluated for the chopping block.  While RTL DSE was marginally
>important in 1999 when it was first submitted, the tree-ssa pipeline as
>a whole has probably made RTL DSE largely pointless.

True. Though I'd argue that DSE2 might be the conceptually more useful pass 
since it sees spill slots. 

Richard. 

>Jeff

Reply via email to