On Thu, Sep 23, 2021, at 7:56 PM, Fenghua Yu wrote:
> Hi, Andy,
>
> On Thu, Sep 23, 2021 at 04:17:05PM -0700, Andy Lutomirski wrote:
>> On Mon, Sep 20, 2021, at 12:23 PM, Fenghua Yu wrote:
>> > ENQCMD requires the IA32_PASID MSR has a valid PASID value which was
>> > allocated to the process during bind. The MSR could be populated eagerly
>> > by an IPI after the PASID is allocated in bind. But the method was
>> > disabled in commit 9bfecd058339 ("x86/cpufeatures: Force disable
>> > X86_FEATURE_ENQCMD and remove update_pasid()")' due to locking and other
>> > issues.
>> >
>> > Since the MSR was cleared in fork()/clone(), the first ENQCMD will
>> > generate a #GP fault. The #GP fault handler will initialize the MSR
>> > if a PASID has been allocated for this process.
>> >
>> > The lazy enabling of the PASID MSR in the #GP handler is not an elegant
>> > solution. But it has the least complexity that fits with h/w behavior.
>> >
>> > Signed-off-by: Fenghua Yu <fenghua...@intel.com>
>> > Reviewed-by: Tony Luck <tony.l...@intel.com>
>> > ---
>> >  arch/x86/include/asm/fpu/api.h |  6 ++++
>> >  arch/x86/include/asm/iommu.h   |  2 ++
>> >  arch/x86/kernel/fpu/xstate.c   | 59 ++++++++++++++++++++++++++++++++++
>> >  arch/x86/kernel/traps.c        | 12 +++++++
>> >  drivers/iommu/intel/svm.c      | 32 ++++++++++++++++++
>> >  5 files changed, 111 insertions(+)
>> >
>> > diff --git a/arch/x86/include/asm/fpu/api.h 
>> > b/arch/x86/include/asm/fpu/api.h
>> > index ca4d0dee1ecd..f146849e5c8c 100644
>> > --- a/arch/x86/include/asm/fpu/api.h
>> > +++ b/arch/x86/include/asm/fpu/api.h
>> > @@ -106,4 +106,10 @@ extern int cpu_has_xfeatures(u64 xfeatures_mask, 
>> > const char **feature_name);
>> >   */
>> >  #define PASID_DISABLED    0
>> > 
>> > +#ifdef CONFIG_INTEL_IOMMU_SVM
>> > +void fpu__pasid_write(u32 pasid);
>> > +#else
>> > +static inline void fpu__pasid_write(u32 pasid) { }
>> > +#endif
>> > +
>> >  #endif /* _ASM_X86_FPU_API_H */
>> > diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
>> > index bf1ed2ddc74b..9c4bf9b0702f 100644
>> > --- a/arch/x86/include/asm/iommu.h
>> > +++ b/arch/x86/include/asm/iommu.h
>> > @@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory 
>> > *rmrr)
>> >    return -EINVAL;
>> >  }
>> > 
>> > +bool __fixup_pasid_exception(void);
>> > +
>> >  #endif /* _ASM_X86_IOMMU_H */
>> > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>> > index c8def1b7f8fb..8a89b2cecd77 100644
>> > --- a/arch/x86/kernel/fpu/xstate.c
>> > +++ b/arch/x86/kernel/fpu/xstate.c
>> > @@ -1289,3 +1289,62 @@ int proc_pid_arch_status(struct seq_file *m, 
>> > struct pid_namespace *ns,
>> >    return 0;
>> >  }
>> >  #endif /* CONFIG_PROC_PID_ARCH_STATUS */
>> > +
>> > +#ifdef CONFIG_INTEL_IOMMU_SVM
>> > +/**
>> > + * fpu__pasid_write - Write the current task's PASID state/MSR.
>> > + * @pasid:        the PASID
>> > + *
>> > + * The PASID is written to the IA32_PASID MSR directly if the MSR is 
>> > active.
>> > + * Otherwise it's written to the PASID. The IA32_PASID MSR should 
>> > contain
>> > + * the PASID after returning to the user.
>> > + *
>> > + * This is called only when ENQCMD is enabled.
>> > + */
>> > +void fpu__pasid_write(u32 pasid)
>> > +{
>> > +  struct xregs_state *xsave = &current->thread.fpu.state.xsave;
>> > +  u64 msr_val = pasid | MSR_IA32_PASID_VALID;
>> > +  struct fpu *fpu = &current->thread.fpu;
>> > +
>> > +  /*
>> > +   * ENQCMD always uses the compacted XSAVE format. Ensure the buffer
>> > +   * has space for the PASID.
>> > +   */
>> > +  BUG_ON(!(xsave->header.xcomp_bv & XFEATURE_MASK_PASID));
>> > +
>> > +  fpregs_lock();
>> > +
>> > +  /*
>> > +   * If the task's FPU doesn't need to be loaded or is valid, directly
>> > +   * write the IA32_PASID MSR. Otherwise, write the PASID state and
>> > +   * the MSR will be loaded from the PASID state before returning to
>> > +   * the user.
>> > +   */
>> > +  if (!test_thread_flag(TIF_NEED_FPU_LOAD) ||
>> > +      fpregs_state_valid(fpu, smp_processor_id())) {
>> > +          wrmsrl(MSR_IA32_PASID, msr_val);
>> 
>> Let me try to decode this.
>> 
>> If the current task's FPU state is live or if the state is in memory but the 
>> CPU regs just happen to match the copy in memory, then write the MSR.  Else 
>> write the value to memory.
>> 
>> This is wrong.  If !TIF_NEED_FPU_LOAD && fpregs_state_valid, you MUST NOT 
>> MODIFY FPU STATE.

Sorry, I typoed that.  I meant TIF_NEED_FPU_LOAD && fpregs_state_valid, which 
is in the case that does wrmsr.

>
> But the FPU state is not modified if !TIF_NEED_FPU_LOAD && fpregs_state_valid.
>
> The FPU state is modified only if TIF_NEED_FPU_LOAD && !fpregs_state_valid.

The MSR is FPU state.  If TIF_NEED_FPU_LOAD && fpregs_state_valid, then the 
authoritative copy of the FPU state is in memory, but the CPU regs are known to 
match memory.  You MUST NOT modify the in-memory state or the regs.

> In this case, the FPU state will be restored to the IA32_PASID MSR when
> exiting to the user. In all other cases, the FPU state will be not be
> restored on exiting to the user and thus the IA32_PASID MSR is directly
> written here.


>
> Is it right?
>
>>  This is not negotiable -- you will break coherence between CPU regs and the 
>> memory image.
>
> fpregs_assert_state_consistent() warns on !TIF_NEED_FPU_LOAD &&
> !fpregs_state_valid. Is that breaking coherence you are talking?

No.  I mean that you broke coherence between memory and registers.  If the task 
resumes on the current CPU without scheduling, the MSR write will take effect.  
If the task resumes on a different CPU or after something else takes over the 
current CPU's regs, the MSR write will be lost.

>
>>  The way you modify the current task's state is either you modify it in CPU 
>> regs (if the kernel knows that the CPU regs are the one and only source of 
>> truth) OR you modify it in memory and invalidate any preserved copies (by 
>> zapping last_cpu). 
>> 
>> In any event, that particular bit of logic really doesn't belong in here -- 
>> it belongs in some helper that gets it right, once.
>> 
>> > +
>> > +/*
>> > + * Try to figure out if there is a PASID MSR value to propagate to the
>> > + * thread taking the #GP.
>> > + */
>> > +bool __fixup_pasid_exception(void)
>> > +{
>> > +  u32 pasid;
>> > +
>> > +  /*
>> > +   * This function is called only when this #GP was triggered from user
>> > +   * space. So the mm cannot be NULL.
>> > +   */
>> > +  pasid = current->mm->pasid;
>> > +
>> > +  /* If no PASID is allocated, there is nothing to propagate. */
>> > +  if (pasid == PASID_DISABLED)
>> > +          return false;
>> > +
>> > +  /*
>> > +   * If the current task already has a valid PASID MSR, then the #GP
>> > +   * fault must be for some non-ENQCMD related reason.
>> > +   */
>> > +  if (current->has_valid_pasid)
>> > +          return false;
>> 
>> IMO "has_valid_pasid" is a poor name.  An mm can have a PASID.  A task has 
>> noticed or it hasn't.
>> 
>> If you really need an in-memory cache, call it "pasid_msr_is_loaded".  Or 
>> just read the field out of FPU state, but that might be painfully slow.
>
> Agree. Thomas wants to change "has_valid_pasid" to "holds_pasid_ref" to
> represents if the task takes a reference to the PASID.

That name will stop making sense when tasks stop holding references.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to