https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99462
--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> --- (for context, the above patch was for PR 98856, but it's based on incorrect latency analysis, see bug 98856 comment #38 ) Right now schedulers cannot easily split instructions for that purpose, it would require computing dependency graph more accurately. Right now dependencies and priorities are computed with respect to instructions as a whole, intelligent splitting would require tracking latencies with respect to individual inputs. sel-sched does not split, but it can perform "renaming" which basically overcomes anti-dependencies by scheduling the desired instruction before the conflicting write (by choosing a different output register), and a reg-reg move later. I think on modern x86 profitability of such splitting is quite dubious, because it would increase the amount of instructions and uops flowing in the CPU front-end and entering the renamer (which is one of narrowest points in the pipeline). Especially on AMD, where not only load-op, but also load-op-store instructions are renamed as a single uop (which is then sent to two or three execution units). I think in common cases where overall critical path is unchanged (like in given examples of pinsrq and various load-op instruction) GCC should simply continue emitting the combined form.