Hello folks,

here are the details of the AVX story I talked about during the last hangout. It is not directly related to what we have discussed with respect to the FPU context switching in HelenOS, but it is tangential.

"Supporting AVX-512 increases the amount of data you need to store across context switches, which increases per-thread overhead. Apple took an innovative approach to this - disable AVX-512 by default, wait for a thread to hit an illegal instruction, enable AVX-512 for that thread, and replay the instruction, so you only take the overhead for threads that *use* AVX-512. But that works badly for apps that follow Intel's guide to detecting whether AVX-512 is available."

https://nondeterministic.computer/@mjg59/109824790414027030

"Apple took this a step further by trying to optimise for whether they needed to restore the AVX-512 registers by simply checking whether ZMM0-31 were all 0 or not. Unfortunately it's legitimate to have ZMM0-31 all be 0 and still have state in K0-7, which then blows up"

https://nondeterministic.computer/@mjg59/109824804903603481


M.D.

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/listinfo/helenos-devel

Reply via email to