On Thu, Apr 18, 2024 at 06:12:22PM +0000, Shankaran, Akash wrote: > Good find. I confirmed after speaking with an intel expert, and from the > intel AVX-512 manual [0] section 14.3, which recommends to check bit27. From > the manual: > > "Prior to using Intel AVX, the application must identify that the operating > system supports the XGETBV instruction, > the YMM register state, in addition to processor's support for YMM state > management using XSAVE/XRSTOR and > AVX instructions. The following simplified sequence accomplishes both and is > strongly recommended. > 1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application > use1). > 2) Issue XGETBV and verify that XCR0[2:1] = '11b' (XMM state and YMM state > are enabled by OS). > 3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported). > (Step 3 can be done in any order relative to 1 and 2.)"
Thanks for confirming. IIUC my patch should be sufficient, then. > It also seems that step 1 and step 2 need to be done prior to the CPUID > OSXSAVE check in the popcount code. This seems to contradict the note about doing step 3 at any point, and given step 1 is the OSXSAVE check, I'm not following what this means, anyway. I'm also wondering if we need to check that (_xgetbv(0) & 0xe6) == 0xe6 instead of just (_xgetbv(0) & 0xe0) != 0, as the status of the lower half of some of the ZMM registers is stored in the SSE and AVX state [0]. I don't know how likely it is that 0xe0 would succeed but 0xe6 wouldn't, but we might as well make it correct. [0] https://en.wikipedia.org/wiki/Control_register#cite_ref-23 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com