On 11/08/2011 08:32 PM, Andy Polyakov via RT wrote:
>>>>>> No, the test is bypassed if XSAVE is 0, not 1. XSAVE being 0 also
>>>>>> implies that AVX flag [as well as FMA and XOP] is 0, which is why is
>>>>>> jumps to 'done' and not 'clear_avx'.
>>>>> This assertion is unfortunately not true on RHEL-6 guests on AVX capable
>>>>> CPUs in XEN VM.
>>> Could you spell it for me? Which flags does guest observe exactly?
>>> XSAVE=0 and AVX=1? I.e. XEN cared to mask XSAVE flag, but not AVX? Is
>>> bit masking configurable? If not, how come it clears XSAVE, but not AVX
>>> (and FMA)? Wouldn't one consider it a bug? I'm not trying to push it
>>> away, just understand...
>>
>> Because the hypervisor does nothing that forbids the use of AVX per se;
>> it's not working only because XSAVE doesn't.  If Xen implemented XSAVE
>> were implemented, AVX would start working without any need to treat it
>> specially in the CPUID masking code.
>
> If hypervisor justs sets up XCR0 and guest attempts to use AVX, the
> result will be deplorable.

No, the hypervisor doesn't set up XCR0 at all, so it masks XSAVE=0.

>>>> The only assertion I found is that XSAVE=0 implies OSXSAVE=0 (and
>>>> OSXSAVE=1 implies XSAVE=1).
>>> But in order to be able to use AVX, you *have to* arrange OSXSAVE=1 (and
>>> of course corresponding bit in XCR0) and prerequisite for this is XSAVE
>>> being 1. I.e. there shouldn't be CPUs that have AVX, but not XSAVE.
>>
>> But it's not in the spec, so it's wrong to assume it.
>
> Specification says that AVX instruction will generate #UD exception if
> XCR0 is not set up appropriately.

Which is indeed what happens, and that's the bug in OpenSSL we were 
trying to fix.  OpenSSL assumed (against the spec) that XSAVE=0 implies 
AVX=0, and didn't check whether AVX needed to be masked; that was wrong. 
  I see you've fixed the problematic part anyway, so thanks for that. :)

>>>> Also, I believe 13.7 implies that it's wrong to clear SSE feature bits
>>>> when XCR0.SSE=0:
>>> That's why it's '&jnc (&label("clear_avx"));' now, not "clear_xmm".
>>
>> I don't think there is any reason to have clear_xmm,
>
> But you can't deny the *possibility* that there is 32-bit OS that is
> aware of XSAVE and explicitly zeros XCR0[2:1].

It can explicitly zero XCR0[1] and still use FXSAVE/FXRSTOR.  That's why 
I say clear_xmm is not necessary: XCR0[1] does not imply that SSE is 
useless.  But that's an imaginary scenario, so I'm fine with the code 
you've patched as is.

Paolo


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to