On Sat, Mar 20, 2021 at 3:13 PM Thomas Gleixner <t...@linutronix.de> wrote: > > On Sun, Feb 21 2021 at 10:56, Chang S. Bae wrote: > > + > > +/* Update MSR IA32_XFD with xfirstuse_not_detected() if needed. */ > > +static inline void xdisable_switch(struct fpu *prev, struct fpu *next) > > +{ > > + if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled()) > > + return; > > + > > + if (unlikely(prev->state_mask != next->state_mask)) > > + xdisable_setbits(xfirstuse_not_detected(next)); > > +} > > So this is invoked on context switch. Toggling bit 18 of MSR_IA32_XFD > when it does not match. The spec document says: > > "System software may disable use of Intel AMX by clearing XCR0[18:17], by > clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. It is recommended that > system software initialize AMX state (e.g., by executing TILERELEASE) > before doing so. This is because maintaining AMX state in a > non-initialized state may have negative power and performance > implications." > > I'm not seeing anything related to this. Is this a recommendation > which can be ignored or is that going to be duct taped into the code > base once the first user complains about slowdowns of their non AMX > workloads on that machine?
I have an obnoxious question: do we really want to use the XFD mechanism? Right now, glibc, and hence most user space code, blindly uses whatever random CPU features are present for no particularly good reason, which means that all these features get stuck in the XINUSE=1 state, even if there is no code whatsoever in the process that benefits. AVX512 is bad enough as we're seeing right now. AMX will be much worse if this happens. We *could* instead use XCR0 and require an actual syscall to enable it. We could even then play games like requiring whomever enables the feature to allocate memory for the state save area for signals, and signal delivery could save the state and disable the feature, this preventing the signal frame from blowing up to 8 or 12 or who knows how many kB. --Andy