Thomas Schwinge <tschwi...@baylibre.com> writes:
> Hi!
>
> On 2024-06-27T23:20:18+0200, I wrote:
>> On 2024-06-27T22:27:21+0200, I wrote:
>>> On 2024-06-27T18:49:17+0200, I wrote:
>>>> On 2023-10-24T19:49:10+0100, Richard Sandiford <richard.sandif...@arm.com> 
>>>> wrote:
>>>>> This patch adds a combine pass that runs late in the pipeline.
>>>
>>> [After sending, I realized I replied to a previous thread of this work.]
>>>
>>>> I've beek looking a bit through recent nvptx target code generation
>>>> changes for GCC target libraries, and thought I'd also share here my
>>>> findings for the "late-combine" changes in isolation, for nvptx target.
>>>> 
>>>> First the unexpected thing:
>>>
>>> So much for "unexpected thing" -- next level of unexpected here...
>>> Appreciated if anyone feels like helping me find my way through this, but
>>> I totally understand if you've got other things to do.
>>
>> OK, I found something already.  (Unexpectedly quickly...)  ;-)
>>
>>>> there are a few cases where we now see unused
>>>> registers get declared
>
>> But in fact, for both cases
>
> Now tested: 's%both%all'.  :-)
>
>> the unexpected difference goes away if after
>> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'.  That's normally run
>> as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's
>> all not active for nvptx target given '!reload_completed', given nvptx is
>> 'targetm.no_register_allocation'.  Maybe we need to enable a few more
>> passes, or is there anything in 'pass_late_combine' to change, so that we
>> don't run into this?  Does it inadvertently mark registers live or
>> something like that?
>
> Basically, is 'pass_late_combine' potentionally doing things that depend
> on later clean-up?  (..., or shouldn't it be doing these things in the
> first place?)

It's possible that late-combine could expose dead code, but I imagine
it's a niche case.

I had a look at the nvptx logs from my comparison, and the cases in
which I saw this seemed to be those where late-combine doesn't find
anything to do.  Does that match your examples?  Specifically,
the effect should be the same with -fdbg-cnt=late_combine:0-0

I think what's happening is that:

- combine exposes dead code

- ce2 previously ran df_analyze with DF_LR_RUN_DCE set, and so cleared
  up the dead code

- late-combine instead runs df_analyze without that flag (since late-combine
  itself doesn't really care whether dead code is present)

- if late-combine doesn't do anything, ce2's df_analyze call has nothing
  to do, and skips even the DCE

The easiest fix would be to add:

  df_set_flags (DF_LR_RUN_DCE);

before df_analyze in late-combine.cc, so that it behaves like ce2.
But the arrangement feels wrong.  I would have expected DF_LR_RUN_DCE
to depend on whether df_analyze had been called since the last DCE pass
(whether DF_LR_RUN_DCE or a full DCE).

Thanks,
Richard

Reply via email to