On Thu, Sep 23, 2021, at 7:56 PM, Fenghua Yu wrote:
> Hi, Andy,
>
> On Thu, Sep 23, 2021 at 04:17:05PM -0700, Andy Lutomirski wrote:
>> On Mon, Sep 20, 2021, at 12:23 PM, Fenghua Yu wrote:
>> > ENQCMD requires the IA32_PASID MSR has a valid PASID value which was
>> > allocated to the process during bind. The MSR could be populated eagerly
>> > by an IPI after the PASID is allocated in bind. But the method was
>> > disabled in commit 9bfecd058339 ("x86/cpufeatures: Force disable
>> > X86_FEATURE_ENQCMD and remove update_pasid()")' due to locking and other
>> > issues.
>> >
>> > Since the MSR was cleared in fork()/clone(), the first ENQCMD will
>> > generate a #GP fault. The #GP fault handler will initialize the MSR
>> > if a PASID has been allocated for this process.
>> >
>> > The lazy enabling of the PASID MSR in the #GP handler is not an elegant
>> > solution. But it has the least complexity that fits with h/w behavior.
>> >
>> > Signed-off-by: Fenghua Yu <fenghua...@intel.com>
>> > Reviewed-by: Tony Luck <tony.l...@intel.com>
>> > ---
>> > arch/x86/include/asm/fpu/api.h | 6 ++++
>> > arch/x86/include/asm/iommu.h | 2 ++
>> > arch/x86/kernel/fpu/xstate.c | 59 ++++++++++++++++++++++++++++++++++
>> > arch/x86/kernel/traps.c | 12 +++++++
>> > drivers/iommu/intel/svm.c | 32 ++++++++++++++++++
>> > 5 files changed, 111 insertions(+)
>> >
>> > diff --git a/arch/x86/include/asm/fpu/api.h
>> > b/arch/x86/include/asm/fpu/api.h
>> > index ca4d0dee1ecd..f146849e5c8c 100644
>> > --- a/arch/x86/include/asm/fpu/api.h
>> > +++ b/arch/x86/include/asm/fpu/api.h
>> > @@ -106,4 +106,10 @@ extern int cpu_has_xfeatures(u64 xfeatures_mask,
>> > const char **feature_name);
>> > */
>> > #define PASID_DISABLED 0
>> >
>> > +#ifdef CONFIG_INTEL_IOMMU_SVM
>> > +void fpu__pasid_write(u32 pasid);
>> > +#else
>> > +static inline void fpu__pasid_write(u32 pasid) { }
>> > +#endif
>> > +
>> > #endif /* _ASM_X86_FPU_API_H */
>> > diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
>> > index bf1ed2ddc74b..9c4bf9b0702f 100644
>> > --- a/arch/x86/include/asm/iommu.h
>> > +++ b/arch/x86/include/asm/iommu.h
>> > @@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory
>> > *rmrr)
>> > return -EINVAL;
>> > }
>> >
>> > +bool __fixup_pasid_exception(void);
>> > +
>> > #endif /* _ASM_X86_IOMMU_H */
>> > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>> > index c8def1b7f8fb..8a89b2cecd77 100644
>> > --- a/arch/x86/kernel/fpu/xstate.c
>> > +++ b/arch/x86/kernel/fpu/xstate.c
>> > @@ -1289,3 +1289,62 @@ int proc_pid_arch_status(struct seq_file *m,
>> > struct pid_namespace *ns,
>> > return 0;
>> > }
>> > #endif /* CONFIG_PROC_PID_ARCH_STATUS */
>> > +
>> > +#ifdef CONFIG_INTEL_IOMMU_SVM
>> > +/**
>> > + * fpu__pasid_write - Write the current task's PASID state/MSR.
>> > + * @pasid: the PASID
>> > + *
>> > + * The PASID is written to the IA32_PASID MSR directly if the MSR is
>> > active.
>> > + * Otherwise it's written to the PASID. The IA32_PASID MSR should
>> > contain
>> > + * the PASID after returning to the user.
>> > + *
>> > + * This is called only when ENQCMD is enabled.
>> > + */
>> > +void fpu__pasid_write(u32 pasid)
>> > +{
>> > + struct xregs_state *xsave = ¤t->thread.fpu.state.xsave;
>> > + u64 msr_val = pasid | MSR_IA32_PASID_VALID;
>> > + struct fpu *fpu = ¤t->thread.fpu;
>> > +
>> > + /*
>> > + * ENQCMD always uses the compacted XSAVE format. Ensure the buffer
>> > + * has space for the PASID.
>> > + */
>> > + BUG_ON(!(xsave->header.xcomp_bv & XFEATURE_MASK_PASID));
>> > +
>> > + fpregs_lock();
>> > +
>> > + /*
>> > + * If the task's FPU doesn't need to be loaded or is valid, directly
>> > + * write the IA32_PASID MSR. Otherwise, write the PASID state and
>> > + * the MSR will be loaded from the PASID state before returning to
>> > + * the user.
>> > + */
>> > + if (!test_thread_flag(TIF_NEED_FPU_LOAD) ||
>> > + fpregs_state_valid(fpu, smp_processor_id())) {
>> > + wrmsrl(MSR_IA32_PASID, msr_val);
>>
>> Let me try to decode this.
>>
>> If the current task's FPU state is live or if the state is in memory but the
>> CPU regs just happen to match the copy in memory, then write the MSR. Else
>> write the value to memory.
>>
>> This is wrong. If !TIF_NEED_FPU_LOAD && fpregs_state_valid, you MUST NOT
>> MODIFY FPU STATE.
Sorry, I typoed that. I meant TIF_NEED_FPU_LOAD && fpregs_state_valid, which
is in the case that does wrmsr.
>
> But the FPU state is not modified if !TIF_NEED_FPU_LOAD && fpregs_state_valid.
>
> The FPU state is modified only if TIF_NEED_FPU_LOAD && !fpregs_state_valid.
The MSR is FPU state. If TIF_NEED_FPU_LOAD && fpregs_state_valid, then the
authoritative copy of the FPU state is in memory, but the CPU regs are known to
match memory. You MUST NOT modify the in-memory state or the regs.
> In this case, the FPU state will be restored to the IA32_PASID MSR when
> exiting to the user. In all other cases, the FPU state will be not be
> restored on exiting to the user and thus the IA32_PASID MSR is directly
> written here.
>
> Is it right?
>
>> This is not negotiable -- you will break coherence between CPU regs and the
>> memory image.
>
> fpregs_assert_state_consistent() warns on !TIF_NEED_FPU_LOAD &&
> !fpregs_state_valid. Is that breaking coherence you are talking?
No. I mean that you broke coherence between memory and registers. If the task
resumes on the current CPU without scheduling, the MSR write will take effect.
If the task resumes on a different CPU or after something else takes over the
current CPU's regs, the MSR write will be lost.
>
>> The way you modify the current task's state is either you modify it in CPU
>> regs (if the kernel knows that the CPU regs are the one and only source of
>> truth) OR you modify it in memory and invalidate any preserved copies (by
>> zapping last_cpu).
>>
>> In any event, that particular bit of logic really doesn't belong in here --
>> it belongs in some helper that gets it right, once.
>>
>> > +
>> > +/*
>> > + * Try to figure out if there is a PASID MSR value to propagate to the
>> > + * thread taking the #GP.
>> > + */
>> > +bool __fixup_pasid_exception(void)
>> > +{
>> > + u32 pasid;
>> > +
>> > + /*
>> > + * This function is called only when this #GP was triggered from user
>> > + * space. So the mm cannot be NULL.
>> > + */
>> > + pasid = current->mm->pasid;
>> > +
>> > + /* If no PASID is allocated, there is nothing to propagate. */
>> > + if (pasid == PASID_DISABLED)
>> > + return false;
>> > +
>> > + /*
>> > + * If the current task already has a valid PASID MSR, then the #GP
>> > + * fault must be for some non-ENQCMD related reason.
>> > + */
>> > + if (current->has_valid_pasid)
>> > + return false;
>>
>> IMO "has_valid_pasid" is a poor name. An mm can have a PASID. A task has
>> noticed or it hasn't.
>>
>> If you really need an in-memory cache, call it "pasid_msr_is_loaded". Or
>> just read the field out of FPU state, but that might be painfully slow.
>
> Agree. Thomas wants to change "has_valid_pasid" to "holds_pasid_ref" to
> represents if the task takes a reference to the PASID.
That name will stop making sense when tasks stop holding references.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu