On 6/19/25 9:15 AM, Paul-Antoine Arras wrote:
*** Context ***
The Haifa scheduler is able to recognise some cases where a dependence D
between two instructions -- a producer P and a consumer C -- can be
broken. In particular, if C contains a memory reference M and P is an
increment, then M can be replaced with its incremented version M' within C.
For instance, the following sequence:
P: a0:DI=a0:DI+0x9 ;; increment
C: s1:DI=[a0:DI] ;; memory ref M
can be replaced with:
C: s1:DI=[a0:DI+0x9] ;; inc mem ref M'
This effectively breaks dependence D.
Fine. But what about P?
*** Issue ***
In most cases, once D is broken, P will be scheduled based on other,
remaining dependences. But what if the only dependence left is a write-
after-write aka output dependence?
After scheduling, we can end up with the following sequence:
P: a0:DI=a0:DI+0x9 ;; dead code
C': a0:DI=sp:DI+0x7ff
Obviously, a0 being immediately overwritten without any intervening
instruction, P has become functionally dead code.
*** Proposed solution ***
The attached patch adds the following logic to apply_replacement. Once D
is broken, it examines P's remaining dependences: if there is only one
left and it is an output dependence, P is marked for elision. P is
actually removed only after apply_replacement has returned, to avoid
iterator invalidation.
Is it a sound approach? Any comment regarding the suggested implementation?
My biggest concern is that this is really a symptom of something else
going wrong earlier in the optimizer pipeline.
If the increment in P can be folded into C and C was the last use of the
output from P, then why was P not folded into C earlier, particularly in
combine?
Now it may turn out that solving this at the root is hard. That'll
depend on what we find. I have a sneaking suspicion this is Vineet's
code to handle the case where we can avoid synthesis of a constant by
instead emitting a pair of addi instructions which is implemented via a
define_insn_and_split.
The problem with that approach is we don't get a good chance to clean
things up well after the split phase. The better approach is to first
adjust the expander to handle those cases. Shreya has that on her todo
list already. Then we turn the existing define_insn_and_split to handle
this case into a define_split (which exposes the split code earlier and
gives combine a chance to clean up remaining problems of this nature).
My worry is we'll still stumble over this stuff due to mvconst_internal,
which we're in the process of trying to get rid if (and is the driving
force behind having Shreya clean up the addi handling).
Jeff