> On Mar 24, 2021, at 2:09 PM, Len Brown <l...@kernel.org> wrote:
> 
> On Tue, Mar 23, 2021 at 11:15 PM Liu, Jing2 <jing2....@linux.intel.com> 
> wrote:
> 
>>> IMO, the problem with AVX512 state
>>> is that we guaranteed it will be zero for XINUSE=0.
>>> That means we have to write 0's on saves.
> 
>> why "we have to write 0's on saves" when XINUSE=0.
>> 
>> Since due to SDM, if XINUSE=0, the XSAVES will *not* save the data and
>> xstate_bv bit is 0; if use XSAVE, it need save the state but
>> xstate_bv bit is also 0.
>>>  It would be better
>>> to be able to skip the write -- even if we can't save the space
>>> we can save the data transfer.  (This is what we did for AMX).
>> With XFD feature that XFD=1, XSAVE command still has to save INIT state
>> to the area. So it seems with XINUSE=0 and XFD=1, the XSAVE(S) commands
>> do the same that both can help save the data transfer.
> 
> Hi Jing, Good observation!
> 
> There are 3 cases.
> 
> 1. Task context switch save into the context switch buffer.
> Here we use XSAVES, and as you point out, XSAVES includes
> the compaction optimization feature tracked by XINUSE.
> So when AMX is enabled, but clean, XSAVES doesn't write zeros.
> Further, it omits the buffer space for AMX in the destination altogether!
> However, since XINUSE=1 is possible, we have to *allocate* a buffer
> large enough to handle the dirty data for when XSAVES can not
> employ that optimization.
> 
> 2. Entry into user signal handler saves into the user space sigframe.
> Here we use XSAVE, and so the hardware will write zeros for XINUSE=0,
> and for AVX512, we save neither time or space.
> 
> My understanding that for application compatibility, we can *not* compact
> the destination buffer that user-space sees.  This is because existing code
> may have adopted fixed size offsets.  (which is unfortunate).
> 
> And so, for AVX512, we both reserve the space, and we write zeros
> for clean AVX512 state.
> 
> For AMX, we must still reserve the space, but we are not going to write zeros
> for clean state.  We so this in software by checking XINUSE=0, and clearing
> the xstate_bf for the XSAVE.  As a result, for XINUSE=0, we can skip
> writing the zeros, even though we can't compress the space.

Why?

> 
> 3. user space always uses fully uncompacted XSAVE buffers.
> 

There is no reason we have to do this for new states. Arguably we shouldn’t for 
AMX to avoid yet another altstack explosion.

Reply via email to