https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99462

            Bug ID: 99462
           Summary: Enhance scheduling to split instructions
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Maybe the scheduler(s) can already do this (I have zero knowledge here).  For
example the x86 vec_concatv2di insn has alternatives that cause the instruction
to be split into multiple uops (vpinsrq, movhpd) when the 'insert' operand
is not XMM (but GPR or MEM).  We now have a peephole2 to split such cases:

+;; Further split pinsrq variants of vec_concatv2di to hide the latency
+;; the GPR->XMM transition(s).
+(define_peephole2
+  [(match_scratch:DI 3 "Yv")
+   (set (match_operand:V2DI 0 "sse_reg_operand")
+       (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
+                        (match_operand:DI 2 "nonimmediate_gr_operand")))]
+  "TARGET_64BIT && TARGET_SSE4_1
+   && !optimize_insn_for_size_p ()"
+  [(set (match_dup 3)
+        (match_dup 2))
+   (set (match_dup 0)
+       (vec_concat:V2DI (match_dup 1)
+                        (match_dup 3)))])

but in reality this is only profitable when we either can execute
two "bad" move uops in parallel (thus when originally composing
two GPRs or two MEMs) or when we can schedule one "bad" move much
earlier.

Thus, can the scheduler already "split" an instruction - say split
away a load uop and issue it early when a scratch register is available?

(the reverse alternative is to not expose multi-uop insns before scheduling
and only merge them later - during scheduling?)

How does GCC deal with situations like this?

Reply via email to