Thomas Schwinge <tschwi...@baylibre.com> writes: > Hi! > > On 2024-06-27T23:20:18+0200, I wrote: >> On 2024-06-27T22:27:21+0200, I wrote: >>> On 2024-06-27T18:49:17+0200, I wrote: >>>> On 2023-10-24T19:49:10+0100, Richard Sandiford <richard.sandif...@arm.com> >>>> wrote: >>>>> This patch adds a combine pass that runs late in the pipeline. >>> >>> [After sending, I realized I replied to a previous thread of this work.] >>> >>>> I've beek looking a bit through recent nvptx target code generation >>>> changes for GCC target libraries, and thought I'd also share here my >>>> findings for the "late-combine" changes in isolation, for nvptx target. >>>> >>>> First the unexpected thing: >>> >>> So much for "unexpected thing" -- next level of unexpected here... >>> Appreciated if anyone feels like helping me find my way through this, but >>> I totally understand if you've got other things to do. >> >> OK, I found something already. (Unexpectedly quickly...) ;-) >> >>>> there are a few cases where we now see unused >>>> registers get declared > >> But in fact, for both cases > > Now tested: 's%both%all'. :-) > >> the unexpected difference goes away if after >> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'. That's normally run >> as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's >> all not active for nvptx target given '!reload_completed', given nvptx is >> 'targetm.no_register_allocation'. Maybe we need to enable a few more >> passes, or is there anything in 'pass_late_combine' to change, so that we >> don't run into this? Does it inadvertently mark registers live or >> something like that? > > Basically, is 'pass_late_combine' potentionally doing things that depend > on later clean-up? (..., or shouldn't it be doing these things in the > first place?)
It's possible that late-combine could expose dead code, but I imagine it's a niche case. I had a look at the nvptx logs from my comparison, and the cases in which I saw this seemed to be those where late-combine doesn't find anything to do. Does that match your examples? Specifically, the effect should be the same with -fdbg-cnt=late_combine:0-0 I think what's happening is that: - combine exposes dead code - ce2 previously ran df_analyze with DF_LR_RUN_DCE set, and so cleared up the dead code - late-combine instead runs df_analyze without that flag (since late-combine itself doesn't really care whether dead code is present) - if late-combine doesn't do anything, ce2's df_analyze call has nothing to do, and skips even the DCE The easiest fix would be to add: df_set_flags (DF_LR_RUN_DCE); before df_analyze in late-combine.cc, so that it behaves like ce2. But the arrangement feels wrong. I would have expected DF_LR_RUN_DCE to depend on whether df_analyze had been called since the last DCE pass (whether DF_LR_RUN_DCE or a full DCE). Thanks, Richard