https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99462
Bug ID: 99462 Summary: Enhance scheduling to split instructions Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Maybe the scheduler(s) can already do this (I have zero knowledge here). For example the x86 vec_concatv2di insn has alternatives that cause the instruction to be split into multiple uops (vpinsrq, movhpd) when the 'insert' operand is not XMM (but GPR or MEM). We now have a peephole2 to split such cases: +;; Further split pinsrq variants of vec_concatv2di to hide the latency +;; the GPR->XMM transition(s). +(define_peephole2 + [(match_scratch:DI 3 "Yv") + (set (match_operand:V2DI 0 "sse_reg_operand") + (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand") + (match_operand:DI 2 "nonimmediate_gr_operand")))] + "TARGET_64BIT && TARGET_SSE4_1 + && !optimize_insn_for_size_p ()" + [(set (match_dup 3) + (match_dup 2)) + (set (match_dup 0) + (vec_concat:V2DI (match_dup 1) + (match_dup 3)))]) but in reality this is only profitable when we either can execute two "bad" move uops in parallel (thus when originally composing two GPRs or two MEMs) or when we can schedule one "bad" move much earlier. Thus, can the scheduler already "split" an instruction - say split away a load uop and issue it early when a scratch register is available? (the reverse alternative is to not expose multi-uop insns before scheduling and only merge them later - during scheduling?) How does GCC deal with situations like this?