On Fri, Apr 16, 2021 at 3:11 PM Len Brown <l...@kernel.org> wrote: > > > I get it. That does not explain why LDMXCSR and VLDMXCSR cause > > pipelines stalls. > > Sorry, I thought this thread was about AMX. > I don't know the answer to your LDMXCSR and VLDMXCSR question.
My point is that every single major math extension since the original XMM extensions (SSE, etc) has come with performance gotchas. Given Intel's general unwillingness to document the gotchas in hardware that is actually shipping, I'm sceptical that AMX is as delightfully gotcha-free as you are making it out to be. Is there any authoritative guidance at all on what actually happens, performance-wise, when someone does AMX math?