On Fri, Mar 26, 2021 at 5:12 AM Florian Weimer <f...@deneb.enyo.de> wrote: > > * Andy Lutomirski-alpha: > > > glibc appears to use AVX512F for memcpy by default. (Unless > > Prefer_ERMS is default-on, but I genuinely can't tell if this is the > > case. I did some searching.) The commit adding it refers to a 2016 > > email saying that it's 30% on KNL. > > As far as I know, glibc only does that on KNL, and there it is > actually beneficial. The relevant code is: > > /* Since AVX512ER is unique to Xeon Phi, set Prefer_No_VZEROUPPER > if AVX512ER is available. Don't use AVX512 to avoid lower CPU > frequency if AVX512ER isn't available. */ > if (CPU_FEATURES_CPU_P (cpu_features, AVX512ER)) > cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] > |= bit_arch_Prefer_No_VZEROUPPER; > else > cpu_features->preferred[index_arch_Prefer_No_AVX512] > |= bit_arch_Prefer_No_AVX512; > > So it's not just about Prefer_ERMS.
Phew. > > > AVX-512 cleared, and programs need to explicitly request enablement. > > This would allow programs to opt into not saving/restoring across > > signals or to save/restore in buffers supplied when the feature is > > enabled. > > Isn't XSAVEOPT already able to handle that? > Yes, but we need a place to put the data, and we need to acknowledge that, with the current save-everything-on-signal model, the amount of time and memory used is essentially unbounded. This isn't great. > > There is a discussion about using the higher (AVX-512-only) %ymm > registers, to avoid the %xmm transition penalty without the need for > VZEROUPPER. (VZEROUPPER is incompatible with RTM from a performance > point of view.) That would perhaps negatively impact XSAVEOPT. > > Assuming you can make XSAVEOPT work for you on the kernel side, my > instincts tell me that we should have markup for RTM, not for AVX-512. > This way, we could avoid use of the AVX-512 registers and keep using > VZEROUPPER, without run-time transaction checks, and deal with other > idiosyncrasies needed for transaction support that users might > encounter once this feature sees more use. But the VZEROUPPER vs RTM > issues is currently stuck in some internal process issue on my end (or > two, come to think of it), which I hope to untangle next month. Can you elaborate on the issue?