On 6/19/25 9:15 AM, Paul-Antoine Arras wrote:
*** Context ***

The Haifa scheduler is able to recognise some cases where a dependence D between two instructions -- a producer P and a consumer C -- can be broken. In particular, if C contains a memory reference M and P is an increment, then M can be replaced with its incremented version M' within C.

For instance, the following sequence:

   P:  a0:DI=a0:DI+0x9         ;; increment
   C:  s1:DI=[a0:DI]           ;; memory ref M

can be replaced with:

   C:  s1:DI=[a0:DI+0x9]       ;; inc mem ref M'

This effectively breaks dependence D.
Fine. But what about P?


*** Issue ***

In most cases, once D is broken, P will be scheduled based on other, remaining dependences. But what if the only dependence left is a write- after-write aka output dependence?

After scheduling, we can end up with the following sequence:

   P:  a0:DI=a0:DI+0x9         ;; dead code
   C': a0:DI=sp:DI+0x7ff

Obviously, a0 being immediately overwritten without any intervening instruction, P has become functionally dead code.


*** Proposed solution ***

The attached patch adds the following logic to apply_replacement. Once D is broken, it examines P's remaining dependences: if there is only one left and it is an output dependence, P is marked for elision. P is actually removed only after apply_replacement has returned, to avoid iterator invalidation.


Is it a sound approach? Any comment regarding the suggested implementation?
My biggest concern is that this is really a symptom of something else going wrong earlier in the optimizer pipeline.

If the increment in P can be folded into C and C was the last use of the output from P, then why was P not folded into C earlier, particularly in combine?

Now it may turn out that solving this at the root is hard. That'll depend on what we find. I have a sneaking suspicion this is Vineet's code to handle the case where we can avoid synthesis of a constant by instead emitting a pair of addi instructions which is implemented via a define_insn_and_split.

The problem with that approach is we don't get a good chance to clean things up well after the split phase. The better approach is to first adjust the expander to handle those cases. Shreya has that on her todo list already. Then we turn the existing define_insn_and_split to handle this case into a define_split (which exposes the split code earlier and gives combine a chance to clean up remaining problems of this nature).

My worry is we'll still stumble over this stuff due to mvconst_internal, which we're in the process of trying to get rid if (and is the driving force behind having Shreya clean up the addi handling).

Jeff

Reply via email to