Re: [PATCH v3 0/4] iommu/vt-d: Add page request draining support

2020-04-27 Thread Lu Baolu

Hi Kevin,

On 2020/4/22 16:06, Lu Baolu wrote:

When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in the software and
remapping hardware. The pending page requests must be drained
so that the pasid could be reused. The chapter 7.10 in the VT-d
specification specifies the software steps to drain pending page
requests and responses.

This includes two parts:
  - PATCH 1/4 ~ 2/4: refactor the qi_submit_sync() to support multiple
descriptors per submission which will be used in the following
patch.
  - PATCH 3/4 ~ 4/4: add page request drain support after a pasid entry
is torn down.

Please help to review.

Best regards,
baolu

Change log:
  v2->v3:
   - Address Kevin's review comments
 - Squash the first 2 patches together;
 - The prq thread is serialized, no need to consider reentrance;
 - Ensure no new-coming prq before drain prq in queue;
 - Handle page request overflow case.


Very appreciated for your review comments.

How about these changes? Any comments?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/traps: Fix up invalid PASID

2020-04-27 Thread Raj, Ashok
Hi Thomas,

On Tue, Apr 28, 2020 at 02:54:59AM +0200, Thomas Gleixner wrote:
> Ashok,
> 
> "Raj, Ashok"  writes:
> > On Sun, Apr 26, 2020 at 05:25:06PM +0200, Thomas Gleixner wrote:
> >> Just for the record I also suggested to have a proper errorcode in the
> >> #GP for ENQCMD and I surely did not suggest to avoid decoding the user
> >> instructions.
> >
> > We certainly discussed the possiblity of adding an error code to 
> > identiy #GP due to ENQCMD with our HW architects. 
> >
> > There are only a few cases that have an error code, like move to segment
> > with an invalid value for instance. There were a few but i don't
> > recall that entire list. 
> >
> > Since the error code is 0 in most places, there isn't plumbing in hw to 
> > return
> > this value in all cases. It appeared that due to some uarch reasons it
> > wasn't as simple as it appears to /me sw kinds :-)
> 
> Sigh.
> 
> > So after some internal discussion we decided to take the current
> > approach. Its possible that if the #GP was due to some other reason
> > we might #GP another time. Since this wasn't perf or speed path we took
> > this lazy approach.
> 
> I know that the HW people's mantra is that everything can be fixed in
> software and therefore slapping new features into the CPUs can be done
> without thinking about the consequeses.
> 
> But we all know from painful experience that this is fundamentally wrong
> unless there is a really compelling reason.

:-)... I'm still looking for the quote from Linus about RAS before
he went to behavior school.


> 
> For new features there is absolutely no reason at all.
> 
> Can HW people pretty please understand that hardware and software have
> to be co-designed and not dictated by 'some uarch reasons'. This is
> nothing fundamentally new. This problem existed 30+ years ago, is well
> documented and has been ignored forever. I'm tired of that, really.
> 
> But as this seems to be unsolvable for the problem at hand can you
> please document the inability, unwillingness or whatever in the
> changelog?

Most certainly!

> 
> The question why this brand new_ ENQCMD + invalid PASID induced #GP does
> not generate an useful error code and needs heuristics to be dealt with
> is pretty obvious.
> 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/traps: Fix up invalid PASID

2020-04-27 Thread Thomas Gleixner
Ashok,

"Raj, Ashok"  writes:
> On Sun, Apr 26, 2020 at 05:25:06PM +0200, Thomas Gleixner wrote:
>> Just for the record I also suggested to have a proper errorcode in the
>> #GP for ENQCMD and I surely did not suggest to avoid decoding the user
>> instructions.
>
> We certainly discussed the possiblity of adding an error code to 
> identiy #GP due to ENQCMD with our HW architects. 
>
> There are only a few cases that have an error code, like move to segment
> with an invalid value for instance. There were a few but i don't
> recall that entire list. 
>
> Since the error code is 0 in most places, there isn't plumbing in hw to return
> this value in all cases. It appeared that due to some uarch reasons it
> wasn't as simple as it appears to /me sw kinds :-)

Sigh.

> So after some internal discussion we decided to take the current
> approach. Its possible that if the #GP was due to some other reason
> we might #GP another time. Since this wasn't perf or speed path we took
> this lazy approach.

I know that the HW people's mantra is that everything can be fixed in
software and therefore slapping new features into the CPUs can be done
without thinking about the consequeses.

But we all know from painful experience that this is fundamentally wrong
unless there is a really compelling reason.

For new features there is absolutely no reason at all.

Can HW people pretty please understand that hardware and software have
to be co-designed and not dictated by 'some uarch reasons'. This is
nothing fundamentally new. This problem existed 30+ years ago, is well
documented and has been ignored forever. I'm tired of that, really.

But as this seems to be unsolvable for the problem at hand can you
please document the inability, unwillingness or whatever in the
changelog?

The question why this brand new_ ENQCMD + invalid PASID induced #GP does
not generate an useful error code and needs heuristics to be dealt with
is pretty obvious.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 6/7] x86/traps: Fix up invalid PASID

2020-04-27 Thread Thomas Gleixner
"Luck, Tony"  writes:
>> Just for the record I also suggested to have a proper errorcode in the
>> #GP for ENQCMD and I surely did not suggest to avoid decoding the user
>> instructions.
>
> Is the heuristic to avoid decoding the user instructions OK (you are just 
> pointing
> out that you should not be given credit for this part of the idea)?

I surely suggested the approach, but at the same time I asked for the
error code and did not say that instruction checking needs to be
avoided.

This comment was just to make it clear that there were other options
discussed. IOW, the changelog should have some explicit explanations
why:

 - the error code idea does not work (according to HW folks)

 - the instruction decoding has no real benefit because $REASONS

Thanks,

tglx


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/traps: Fix up invalid PASID

2020-04-27 Thread Thomas Gleixner
Fenghua Yu  writes:
> On Sun, Apr 26, 2020 at 05:25:06PM +0200, Thomas Gleixner wrote:
>> > @@ -499,6 +510,12 @@ dotraplinkage void do_general_protection(struct 
>> > pt_regs *regs, long error_code)
>> >int ret;
>> >  
>> >RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
>> > +
>> > +  if (user_mode(regs) && fixup_pasid_exception()) {
>> > +  cond_local_irq_enable(regs);
>> 
>> The point of this conditional irq enable _AFTER_ calling into the fixup
>> function is? Also what's the reason for keeping interrupts disabled
>> while calling into that function? Comments exist for a reason.
>
> irq needs to be disabled because the fixup function requires to disable
> preempt in order to update the PASID MSR on the faulting CPU.

No, that's just wrong. It's not about the update itself.

> Will add comments here.

Factual ones and not some fairy tales please.

>> > +bool __fixup_pasid_exception(void)
>> > +{
>> > +  struct mm_struct *mm;
>> > +  bool ret = true;
>> > +  u64 pasid_msr;
>> > +  int pasid;
>> > +
>> > +  mm = get_task_mm(current);
>> 
>> Why do you need a reference to current->mm ?
>
> The PASID for the address space is per mm and is stored in mm.
> To get the PASID, we need to get the mm and the
> pasid=mm->context.pasid.

It's obvious that you need to access current-mm in order to check
current->mm->context.pasid. Let me rephrase the question:

   Why do you need to take a reference on current->mm ?

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-27 Thread Thomas Gleixner
Fenghua Yu  writes:
> On Sun, Apr 26, 2020 at 04:55:25PM +0200, Thomas Gleixner wrote:
>> Fenghua Yu  writes:
>> > + +#ifdef CONFIG_INTEL_IOMMU_SVM + int pasid;
>> 
>> int? It's a value which gets programmed into the MSR along with the valid 
>> bit (bit 31) set.
>
> The pasid is defined as "int" in struct intel_svm and in 
> intel_svm_bind_mm() and intel_svm_unbind_mm(). So the pasid defined in this 
> patch follows the same type defined in those places.

Which are wrong to begin with.

>> ioasid_alloc() uses ioasid_t which is
>> 
>> typedef unsigned int ioasid_t;
>> 
>> Can we please have consistent types and behaviour all over the place?
>
> Should I just define "pasid", "pasid_max", "flags" as "unsigned int" for
> the new functions/code?
>
> Or should I also change their types to "unsigned int" in the original
> svm code (struct intel_svm, ...bind_mm(), etc)? I'm afraid that will be
> a lot of changes and should be in a separate preparation patch.

Yes, please. The existance of non-sensical code is not an excuse to
proliferate it.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 6/7] x86/traps: Fix up invalid PASID

2020-04-27 Thread Luck, Tony
> Just for the record I also suggested to have a proper errorcode in the
> #GP for ENQCMD and I surely did not suggest to avoid decoding the user
> instructions.

Thomas,

Is the heuristic to avoid decoding the user instructions OK (you are just 
pointing
out that you should not be given credit for this part of the idea)?

Or are you saying that you'd like to see the instruction checked to see that it
was an ENQCMD?

-Tony

 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/traps: Fix up invalid PASID

2020-04-27 Thread Raj, Ashok
Hi Thomas

On Sun, Apr 26, 2020 at 05:25:06PM +0200, Thomas Gleixner wrote:
> Fenghua Yu  writes:
> > A #GP fault is generated when ENQCMD instruction is executed without
> > a valid PASID value programmed in.
> 
> Programmed in what?
> 
> > The #GP fault handler will initialize the current thread's PASID MSR.
> >
> > The following heuristic is used to avoid decoding the user instructions
> > to determine the precise reason for the #GP fault:
> > 1) If the mm for the process has not been allocated a PASID, this #GP
> >cannot be fixed.
> > 2) If the PASID MSR is already initialized, then the #GP was for some
> >other reason
> > 3) Try initializing the PASID MSR and returning. If the #GP was from
> >an ENQCMD this will fix it. If not, the #GP fault will be repeated
> >and we will hit case "2".
> >
> > Suggested-by: Thomas Gleixner 
> 
> Just for the record I also suggested to have a proper errorcode in the
> #GP for ENQCMD and I surely did not suggest to avoid decoding the user
> instructions.

We certainly discussed the possiblity of adding an error code to 
identiy #GP due to ENQCMD with our HW architects. 

There are only a few cases that have an error code, like move to segment
with an invalid value for instance. There were a few but i don't
recall that entire list. 

Since the error code is 0 in most places, there isn't plumbing in hw to return
this value in all cases. It appeared that due to some uarch reasons it
wasn't as simple as it appears to /me sw kinds :-)

So after some internal discussion we decided to take the current
approach. Its possible that if the #GP was due to some other reason
we might #GP another time. Since this wasn't perf or speed path we took
this lazy approach. 

We will keep tabs with HW folks for future consideration. 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-27 Thread Fenghua Yu
On Sun, Apr 26, 2020 at 04:55:25PM +0200, Thomas Gleixner wrote:
> Fenghua Yu  writes:
> > +++ b/arch/x86/include/asm/mmu.h @@ -50,6 +50,10 @@ typedef struct {
> > u16 pkey_allocation_map; s16 execute_only_pkey;
> >  #endif
> > + +#ifdef CONFIG_INTEL_IOMMU_SVM +  int pasid;
> 
> int? It's a value which gets programmed into the MSR along with the valid 
> bit (bit 31) set.

The pasid is defined as "int" in struct intel_svm and in 
intel_svm_bind_mm() and intel_svm_unbind_mm(). So the pasid defined in this 
patch follows the same type defined in those places.

But as you pointed out below, ioasid_t is defined as "unsigned int".

> 
> >  extern void switch_mm(struct mm_struct *prev, struct mm_struct *next, 
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c 
> > index d7f2a5358900..da718a49e91e 100644 --- a/drivers/iommu/intel-svm.c 
> > +++ b/drivers/iommu/intel-svm.c @@ -226,6 +226,45 @@ static 
> > LIST_HEAD(global_svm_list);
> > list_for_each_entry((sdev), &(svm)->devs, list) \
> > if ((d) != (sdev)->dev) {} else
> >  
> > +/* + * If this mm already has a PASID we can use it. Otherwise 
> > allocate a new one. + * Let the caller know if we did an allocation via 
> > 'new_pasid'. + */ +static int alloc_pasid(struct intel_svm *svm, struct 
> > mm_struct *mm, + int pasid_max, bool *new_pasid, int flags)
> 
> Again, data types please. flags are generally unsigned and not plain int. 
> Also pasid_max is certainly not plain int either.

The caller defines pasid_max and flags as "int". This function just follows
the caller's definitions.

But I will change their definitions to "unsigned int" here.

> 
> > + *new_pasid = false; + + return mm->context.pasid; + } + + /* + * 
> > Allocate a new pasid. Do not use PASID 0, reserved for RID to + * 
> > PASID. + */ + pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, 
> > svm);
> 
> ioasid_alloc() uses ioasid_t which is
> 
> typedef unsigned int ioasid_t;
> 
> Can we please have consistent types and behaviour all over the place?

Should I just define "pasid", "pasid_max", "flags" as "unsigned int" for
the new functions/code?

Or should I also change their types to "unsigned int" in the original
svm code (struct intel_svm, ...bind_mm(), etc)? I'm afraid that will be
a lot of changes and should be in a separate preparation patch.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/7] x86/msr-index: Define IA32_PASID MSR

2020-04-27 Thread Fenghua Yu
On Sun, Apr 26, 2020 at 01:22:00PM +0200, Thomas Gleixner wrote:
> Fenghua Yu  writes:
> 
> > The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
> > (PASID), a 20-bit value. Bit 31 must be set to indicate the value
> > programmed in the MSR is valid. Hardware uses PASID to identify which
> > process submits the work and direct responses to the right process.
> 
> No. It does not identify the process. It identifies the process' address
> space as the name says.
> 
> Please provide coherent and precise information.

Ok. Will change to address space identification.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/7] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

2020-04-27 Thread Fenghua Yu
On Sun, Apr 26, 2020 at 01:17:11PM +0200, Thomas Gleixner wrote:
> Fenghua Yu  writes:
> > From: Yu-cheng Yu 
> >
> > The IA32_PASID MSR is used when a task submits work via the ENQCMD
> > instruction.
> 
> Is used?
> 
> > The per task MSR is stored in the task's supervisor FPU
> 
> per task MSR? Lot's of MSRs 
> 
> > PASID state and is context switched by XSAVES/XRSTORS.
> >

Maybe change the commit messge to the following?

ENQCMD instruction reads PASID from IA32_PASID MSR. The MSR is stored
in the task's supervisor FPU PASID state and is context switched by
XSAVES/XRSTORS.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 4/8] iommu/vt-d: Add bind guest PASID support

2020-04-27 Thread Jacob Pan
On Fri, 24 Apr 2020 10:47:45 +
"Tian, Kevin"  wrote:

> > From: Jacob Pan 
> > Sent: Wednesday, April 22, 2020 2:53 AM
> > 
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.  
> 
> Above description is not accurate. Guest PASID table updates don't
> 'result in' PASID cache flush automatically. What about:
> --
> The guest needs to invalidate the PASID cache for any update to
> guest PASID table. Those invalidation requests are intercepted
> by the VMM and passed down to the host as binding guest PASID
> calls.
> --
It is good to add more details, thanks.

> > 
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
> > 
> > .-.  .---.
> > |   vIOMMU|  | Guest process CR3, FL only|
> > | |  '---'
> > ./
> > | PASID Entry |--- PASID cache flush -
> > '-'   |
> > | |   V
> > | |CR3 in GPA
> > '-'
> > Guest
> > --| Shadow |--|
> >   vv  v
> > Host
> > .-.  .--.
> > |   pIOMMU|  | Bind FL for GVA-GPA  |
> > | |  '--'
> > ./  |
> > | PASID Entry | V (Nested xlate)
> > '\.--.
> > | |   |SL for GPA-HPA, default domain|
> > | |   '--'
> > '-'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/intel-iommu.c |   4 +
> >  drivers/iommu/intel-svm.c   | 204
> > 
> >  include/linux/intel-iommu.h |   8 +-
> >  include/linux/intel-svm.h   |  17 
> >  4 files changed, 232 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 9c01e391a931..8862d6b0ef21
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -6179,6 +6179,10 @@ const struct iommu_ops intel_iommu_ops = {
> > .dev_disable_feat   = intel_iommu_dev_disable_feat,
> > .is_attach_deferred =
> > intel_iommu_is_attach_deferred, .pgsize_bitmap  =
> > INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +   .sva_bind_gpasid= intel_svm_bind_gpasid,
> > +   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
> > +#endif
> >  };
> > 
> >  static void quirk_iommu_igfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 2998418f0a38..69b2070b843d 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -226,6 +226,210 @@ static LIST_HEAD(global_svm_list);
> > list_for_each_entry((sdev), &(svm)->devs, list) \
> > if ((d) != (sdev)->dev) {} else
> > 
> > +static inline void intel_svm_free_if_empty(struct intel_svm *svm,
> > u64 pasid) +{
> > +   if (list_empty(>devs)) {
> > +   ioasid_set_data(pasid, NULL);
> > +   kfree(svm);
> > +   }
> > +}  
> 
> Do we really need a function form instead of putting the 4 lines
> directly after the 'out' label?
> 
it is more readable and good for code sharing.

> > +
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain, struct
> > device *dev,
> > + struct iommu_gpasid_bind_data *data)
> > +{
> > +   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +   struct dmar_domain *dmar_domain;
> > +   struct intel_svm_dev *sdev;
> > +   struct intel_svm *svm;
> > +   int ret = 0;
> > +
> > +   if (WARN_ON(!iommu) || !data)
> > +   return -EINVAL;  
> 
> well, why not checking !dev together?
This is kernel API, unlike iommu and data caller fills in dev directly.

> 
> > +
> > +   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > +   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > +   return -EINVAL;
> > +
> > +   if (dev_is_pci(dev)) {
> > +   /* VT-d supports devices with full 20 bit PASIDs
> > only */
> > +   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > +   return -EINVAL;
> > +   } else {
> > +   return -ENOTSUPP;
> > +   }
> > +
> > +   /*
> > +* We only check host PASID range, we have no knowledge to
> > check
> > +* guest PASID range.
> > +*/
> > +   if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> > +   return -EINVAL;
> > +
> > +   dmar_domain = 

Re: [PATCH 2/7] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions

2020-04-27 Thread Fenghua Yu
On Sun, Apr 26, 2020 at 01:06:33PM +0200, Thomas Gleixner wrote:
> Fenghua Yu  writes:
> > A user space application can execute ENQCMD instruction to submit work
> > to device. The kernel executes ENQCMDS instruction to submit work to
> > device.
> 
> So a user space application _can_ execute ENQCMD and the kernel
> executes ENQCMDS. And both submit work to device.
> 
> > There is a lot of other enabling needed for the instructions to actually
> > be usable in user space and the kernel, and that enabling is coming later
> > in the series and in device drivers.
> 
> That's important information to the enumeration of the instructions in
> which way?

I just want to notify people this enumeration is only part of enabling
ENQCMD. But seems this paragraph is not so useful to be here. Maybe I can
remove it.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/7] docs: x86: Add a documentation for ENQCMD

2020-04-27 Thread Fenghua Yu
On Sun, Apr 26, 2020 at 01:02:12PM +0200, Thomas Gleixner wrote:
> Fenghua Yu  writes:
> 
> s/Add a documentation/Add documentation/
> 
> > From: Ashok Raj 
> >
> > ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
> > features are a complicated stack with lots of interconnected pieces.
> > This documentation provides a big picture overview for all of the
> > features.
> >
> > Signed-off-by: Ashok Raj 
> > Co-developed-by: Fenghua Yu 
> > Signed-off-by: Fenghua Yu 
> > Reviewed-by: Tony Luck 
> > ---
> >  Documentation/x86/enqcmd.rst | 185 +++
> 
> How is that hooked up into the Documentation index?
> 
>  Documentation/x86/enqcmd.rst: WARNING: document isn't included in any toctree
> 
> > +++ b/Documentation/x86/enqcmd.rst
> > @@ -0,0 +1,185 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +Improved Device Interaction Overview
> 
> So the document is about ENQCMD, right? Can you please make that in some
> way consistently named?
> 
> > +
> > +== Background ==
> 
> This lacks any docbook formatting The resulting HTML looks like ...
> 
> > +
> > +Shared Virtual Addressing (SVA) allows the processor and device to use the
> > +same virtual addresses avoiding the need for software to translate virtual
> > +addresses to physical addresses. ENQCMD is a new instruction on Intel
> > +platforms that allows user applications to directly notify hardware of new
> > +work, much like doorbells are used in some hardware, but carries a payload
> > +that carries the PASID and some additional device specific commands
> > +along with it.
> 
> Sorry that's not background information, that's an agglomeration of
> words.
> 
> Can you please explain properly what's the background of SVA, how it
> differs from regular device addressing and what kind of requirements it
> has?
> 
> ENQCMD is not related to background. It's part of the new technology.
> 
> > +== Address Space Tagging ==
> > +
> > +A new MSR (MSR_IA32_PASID) allows an application address space to be
> > +associated with what the PCIe spec calls a Process Address Space ID
> > +(PASID). This PASID tag is carried along with all requests between
> > +applications and devices and allows devices to interact with the process
> > +address space.
> 
> Sigh. The important part here is not the MSR. The important part is to
> explain what PASID is and where it comes from. Documentation has similar
> rules as changelogs:
> 
>   1) Provide context
> 
>   2) Explain requirements
>   
>   3) Explain implementation
> 
> The pile you provided is completely backwards and unstructured.

Ok. Will address all of the comments.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/traps: Fix up invalid PASID

2020-04-27 Thread Fenghua Yu
On Sun, Apr 26, 2020 at 05:25:06PM +0200, Thomas Gleixner wrote:
> Fenghua Yu  writes:
> > A #GP fault is generated when ENQCMD instruction is executed without
> > a valid PASID value programmed in.
> 
> Programmed in what?

Will change to "...programmed in the PASID MSR".

> 
> > The #GP fault handler will initialize the current thread's PASID MSR.
> >
> > The following heuristic is used to avoid decoding the user instructions
> > to determine the precise reason for the #GP fault:
> > 1) If the mm for the process has not been allocated a PASID, this #GP
> >cannot be fixed.
> > 2) If the PASID MSR is already initialized, then the #GP was for some
> >other reason
> > 3) Try initializing the PASID MSR and returning. If the #GP was from
> >an ENQCMD this will fix it. If not, the #GP fault will be repeated
> >and we will hit case "2".
> >
> > Suggested-by: Thomas Gleixner 
> 
> Just for the record I also suggested to have a proper errorcode in the
> #GP for ENQCMD and I surely did not suggest to avoid decoding the user
> instructions.
> 
> >  void __free_pasid(struct mm_struct *mm);
> > +bool __fixup_pasid_exception(void);
> >  
> >  #endif /* _ASM_X86_IOMMU_H */
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 6ef00eb6fbb9..369b5ba94635 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -56,6 +56,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #ifdef CONFIG_X86_64
> >  #include 
> > @@ -488,6 +489,16 @@ static enum kernel_gp_hint 
> > get_kernel_gp_address(struct pt_regs *regs,
> > return GP_CANONICAL;
> >  }
> >  
> > +static bool fixup_pasid_exception(void)
> > +{
> > +   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
> > +   return false;
> > +   if (!static_cpu_has(X86_FEATURE_ENQCMD))
> > +   return false;
> > +
> > +   return __fixup_pasid_exception();
> > +}
> > +
> >  #define GPFSTR "general protection fault"
> >  
> >  dotraplinkage void do_general_protection(struct pt_regs *regs, long 
> > error_code)
> > @@ -499,6 +510,12 @@ dotraplinkage void do_general_protection(struct 
> > pt_regs *regs, long error_code)
> > int ret;
> >  
> > RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > +
> > +   if (user_mode(regs) && fixup_pasid_exception()) {
> > +   cond_local_irq_enable(regs);
> 
> The point of this conditional irq enable _AFTER_ calling into the fixup
> function is? Also what's the reason for keeping interrupts disabled
> while calling into that function? Comments exist for a reason.

irq needs to be disabled because the fixup function requires to disable
preempt in order to update the PASID MSR on the faulting CPU.

Will add comments here.

> 
> > +   return;
> > +   }
> > +
> > cond_local_irq_enable(regs);
> >  
> > if (static_cpu_has(X86_FEATURE_UMIP)) {
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index da718a49e91e..5ed39a022adb 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -759,3 +759,40 @@ void __free_pasid(struct mm_struct *mm)
> >  */
> > ioasid_free(pasid);
> >  }
> > +
> > +/*
> > + * Fix up the PASID MSR if possible.
> > + *
> > + * But if the #GP was due to another reason, a second #GP might be 
> > triggered
> > + * to force proper behavior.
> > + */
> > +bool __fixup_pasid_exception(void)
> > +{
> > +   struct mm_struct *mm;
> > +   bool ret = true;
> > +   u64 pasid_msr;
> > +   int pasid;
> > +
> > +   mm = get_task_mm(current);
> 
> Why do you need a reference to current->mm ?

The PASID for the address space is per mm and is stored in mm.
To get the PASID, we need to get the mm and the pasid=mm->context.pasid.


> 
> > +   /* This #GP was triggered from user mode. So mm cannot be NULL. */
> > +   pasid = mm->context.pasid;
> > +   /* Ensure this process has been bound to a PASID. */
> > +   if (!pasid) {
> > +   ret = false;
> > +   goto out;
> > +   }
> > +
> > +   /* Check to see if the PASID MSR has already been set for this task. */
> > +   rdmsrl(MSR_IA32_PASID, pasid_msr);
> > +   if (pasid_msr & MSR_IA32_PASID_VALID) {
> > +   ret = false;
> > +   goto out;
> > +   }
> > +
> > +   /* Fix up the MSR. */
> > +   wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> > +out:
> > +   mmput(mm);

Thanks,

-Fenghua

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-04-27 Thread Ashish Kalra
Hello Konrad,

On Mon, Mar 30, 2020 at 10:25:51PM +, Ashish Kalra wrote:
> Hello Konrad,
> 
> On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Tue, Feb 04, 2020 at 07:35:00PM +, Ashish Kalra wrote:
> > > Hello Konrad,
> > > 
> > > Looking fwd. to your feedback regarding support of other memory
> > > encryption architectures such as Power, S390, etc.
> > > 
> > > Thanks,
> > > Ashish
> > > 
> > > On Fri, Jan 24, 2020 at 11:00:08PM +, Ashish Kalra wrote:
> > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > 
> > > > > > Additional memory calculations based on # of PCI devices and
> > > > > > their memory ranges will make it more complicated with so
> > > > > > many other permutations and combinations to explore, it is
> > > > > > essential to keep this patch as simple as possible by 
> > > > > > adjusting the bounce buffer size simply by determining it
> > > > > > from the amount of provisioned guest memory.
> > > > >> 
> > > > >> Please rework the patch to:
> > > > >> 
> > > > >>  - Use a log solution instead of the multiplication.
> > > > >>Feel free to cap it at a sensible value.
> > > > 
> > > > Ok.
> > > > 
> > > > >> 
> > > > >>  - Also the code depends on SWIOTLB calling in to the
> > > > >>adjust_swiotlb_default_size which looks wrong.
> > > > >> 
> > > > >>You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> > > > 
> > > > >>That function's purpose is to report a value.
> > > > >> 
> > > > >>  - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > > > >> 
> > > > >>  - Can you utilize the IOMMU_INIT APIs and have your own detect 
> > > > >> which would
> > > > >>modify the io_tlb_nslabs (and set swiotbl=1?).
> > > > 
> > > > This seems to be a nice option, but then IOMMU_INIT APIs are
> > > > x86-specific and this swiotlb buffer size adjustment is also needed
> > > > for other memory encryption architectures like Power, S390, etc.
> > 
> > Oh dear. That I hadn't considered.
> > > > 
> > > > >> 
> > > > >>Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - 
> > > > >> so
> > > > >>perhaps add in this code ? Albeit it really should be in it's own
> > > > >>file, not in arch/x86/kernel/pci-swiotlb.c
> > > > 
> > > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > > forces swiotlb on, but again this is all x86 architecture specific.
> > 
> > Then it looks like the best bet is to do it from within swiotlb_init?
> > We really can't do it from swiotlb_size_or_default - that function
> > should just return a value and nothing else.
> > 
> 
> Actually, we need to do it in swiotlb_size_or_default() as this gets called by
> reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to
> reserve low crashkernel memory. If we adjust swiotlb size later in
> swiotlb_init() which gets called later than reserve_crashkernel_low(),
> then any swiotlb size changes/expansion will conflict/overlap with the
> low memory reserved for crashkernel.
> 
and will also potentially cause SWIOTLB buffer allocation failures.

Do you have any feedback, comments on the above ?

As such i feel, this patch is complete otherwise and can be included as
it is. 

Thanks,
Ashish
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 05/10] iommu/vt-d: Add bind guest PASID support

2020-04-27 Thread Jacob Pan
On Fri, 17 Apr 2020 23:46:13 +
"Tian, Kevin"  wrote:

> > From: Jacob Pan 
> > Sent: Friday, April 17, 2020 11:29 PM
> > 
> > On Fri, 17 Apr 2020 09:46:55 +0200
> > Auger Eric  wrote:
> >   
> > > Hi Kevin,
> > > On 4/17/20 4:45 AM, Tian, Kevin wrote:  
> > > >> From: Auger Eric
> > > >> Sent: Thursday, April 16, 2020 6:43 PM
> > > >>  
> > > > [...]  
> > > > +   if (svm) {
> > > > +   /*
> > > > +* If we found svm for the PASID, there
> > > > must be at
> > > > +* least one device bond, otherwise svm
> > > > should be freed.
> > > > +*/
> > > > +   if (WARN_ON(list_empty(>devs))) {
> > > > +   ret = -EINVAL;
> > > > +   goto out;
> > > > +   }
> > > > +
> > > > +   for_each_svm_dev(sdev, svm, dev) {
> > > > +   /* In case of multiple sub-devices
> > > > of the same pdev
> > > > +* assigned, we should allow
> > > > multiple bind calls with
> > > > +* the same PASID and pdev.
> > > > +*/
> > > > +   sdev->users++;  
> > >  What if this is not an mdev device. Is it also allowed?  
> > > >>> Yes. IOMMU and VT-d driver is not mdev aware. Here mdev is
> > > >>> just an example of normal use case. You can bind the same PCI
> > > >>> device (PF or SRIOV VF) more than once to the same PASID.
> > > >>> Just need to unbind also.  
> > > >>
> > > >> I don't get the point of binding a non mdev device several
> > > >> times with the same PASID. Do you intend to allow that at
> > > >> userspace level or prevent this from happening in VFIO?  
> > > >
> > > > I feel it's better to prevent this from happening, otherwise
> > > > VFIO also needs to track the bind count and do multiple unbinds
> > > > at mm_exit. But it's not necessary to prevent it in VFIO. We
> > > > can check here upon whether aux_domain is valid, and if not
> > > > return -EBUSY.  
> > > Ah OK. So if we can detect the case here it is even better
> > >  
> > I don't understand why VFIO cannot track, since it is mdev aware.
> > if we don;t refcount the users, one mdev unbind will result unbind
> > for all mdev under the same pdev. That may not be the right thing
> > to do. 
> 
> The open here is not for mdev, which refcount is still required.
> Eric's point is for non-mdev endpoints. It's meaningless and not
> intuitive to allow binding a PASID multiple-times to the same device. 
> 
That seems contradictory. The refcount here is intended/required for sub
devices such as mdev. Since IOMMU driver is not mdev aware, we cannot
treat devices differently.

Perhaps Yi can clarify if this case is handled within VFIO, then I can
drop the refcount here.

> Thanks
> Kevin

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu