Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-28 Thread Dave Hansen
On 4/28/22 09:01, Jean-Philippe Brucker wrote:
>> But, this misses an important point: even after the address space is
>> gone, the PASID will still be programmed into a device.  Device drivers
>> might, for instance, still need to flush operations that are outstanding
>> and need to use that PASID.  They do this at ->release() time.
> It's not really clear which release() this is. For us it's file descriptor
> release() (not MMU notifier release(), which is how I initially understood
> this sentence)

OK, maybe that should be: "file->release() time" to make it more clear.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-28 Thread Dave Hansen
On 4/28/22 08:28, Fenghua Yu wrote:
> Do you want me to change the changlog to add both this paragraph and the
> following paragraph?

Yes, as long as everyone agrees that it captures the issue well.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-28 Thread Dave Hansen
On 4/25/22 21:20, Fenghua Yu wrote:
>>From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001
> From: Fenghua Yu 
> Date: Fri, 15 Apr 2022 00:51:33 -0700
> Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue
> 
> A PASID might be still used on ARM after it is freed in __mmput().

Is it really just ARM?

> process:
>   open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm
>   exit();
>   exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1
>   exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure
> 
> To avoid the use-after-free issue, free the PASID after no device uses it,
> i.e. after all devices are unbound from the mm.
> 
> sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count.
> __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID
> in __mmdrop() guarantees the PASID is safely freed only after no device
> is bound to the mm.

Does this changelog work for everyone?

==

tl;dr: The PASID is being freed too early.  It needs to stay around
until after device drivers that might be using it have had a chance to
clear it out of the hardware.

--

As a reminder:

mmget() /mmput()  refcount the mm's address space
mmgrab()/mmdrop() refcount the mm itself

The PASID is currently tied to the life of the mm's address space and
freed in __mmput().  This makes logical sense because the PASID can't be
used once the address space is gone.

But, this misses an important point: even after the address space is
gone, the PASID will still be programmed into a device.  Device drivers
might, for instance, still need to flush operations that are outstanding
and need to use that PASID.  They do this at ->release() time.

Device drivers hold a reference on the mm itself and drop it at
->release() time.  But, the device driver holds a reference mm itself,
not the address space.  The address space (and the PASID) is long gone
by the time the driver tries to clean up.  This is effectively a
use-after-free bug on the PASID.

To fix this, move the PASID free operation from __mmput() to __mmdrop().
 This ensures that the device drivers' existing mmgrab() keeps the PASID
allocated until they drop their mm reference.

>  kernel/fork.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9796897560ab..35a3beff140b 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm)
>   mmu_notifier_subscriptions_destroy(mm);
>   check_mm(mm);
>   put_user_ns(mm->user_ns);
> + mm_pasid_drop(mm);
>   free_mm(mm);
>  }
>  EXPORT_SYMBOL_GPL(__mmdrop);
> @@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm)
>   }
>   if (mm->binfmt)
>   module_put(mm->binfmt->module);
> - mm_pasid_drop(mm);
>   mmdrop(mm);
>  }
>  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-26 Thread Dave Hansen
On 4/26/22 09:48, Jean-Philippe Brucker wrote:
> On Tue, Apr 26, 2022 at 08:27:00AM -0700, Dave Hansen wrote:
>> On 4/25/22 09:40, Jean-Philippe Brucker wrote:
>>> The problem is that we'd have to request the device driver to stop DMA
>>> before we can destroy the context and free the PASID. We did consider
>>> doing this in the release() MMU notifier, but there were concerns about
>>> blocking mmput() for too long (for example
>>> https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/
>>> though I think there was a more recent discussion). We also need to drain
>>> the PRI and fault queues to get rid of all references to that PASID.
>> Is the concern truly about blocking mmput() itself?  Or, is it about
>> releasing the resources associated with the mm?
> The latter I think, this one was about releasing pages as fast as possible
> if the process is picked by the OOM killer. 

We're tying the PASID to the life of the mm itself, not the mm's address
space.  That means the PASID should be tied to
mmgrab()/mmdrop()/mm->mm_count.

The address space is what the OOM killer is after.  That gets refcounted
with mmget()/mmput()/mm->mm_users.  The OOM killer is satiated by the
page freeing done in __mmput()->exit_mmap().

Also, all the VMAs should be gone after exit_mmap().  So, even if
vma->vm_file was holding a reference to a device driver, that reference
should be gone by the time __mmdrop() is actually freeing the PASID.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-26 Thread Dave Hansen
On 4/25/22 09:40, Jean-Philippe Brucker wrote:
> The problem is that we'd have to request the device driver to stop DMA
> before we can destroy the context and free the PASID. We did consider
> doing this in the release() MMU notifier, but there were concerns about
> blocking mmput() for too long (for example
> https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/
> though I think there was a more recent discussion). We also need to drain
> the PRI and fault queues to get rid of all references to that PASID.

Is the concern truly about blocking mmput() itself?  Or, is it about
releasing the resources associated with the mm?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Dave Hansen
On 4/25/22 07:26, Jean-Philippe Brucker wrote:
>>
>> How does the IOMMU hardware know that all activity to a given PASID is
>> finished?  That activity should, today, be independent of an mm or a
>> fd's lifetime.
> In the case of uacce, it's tied to the fd lifetime: opening an accelerator
> queue calls iommu_sva_bind_device(), which sets up the PASID context in
> the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
> destroys the PASID context (after the device driver stopped all DMA for
> this PASID).

Could this PASID context destruction move from being "fd-based" to
happening under mm_pasid_drop()?  Logically, it seems like that should
work because mm_pasid_drop() happens after exit_mmap() where the VMAs
(which hold references to 'struct file' via vma->vm_file) are torn down.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Dave Hansen
On 4/25/22 06:53, Jean-Philippe Brucker wrote:
> On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote:
 On 5.17
 fops_release is called automatically, as well as iommu_sva_unbind_device.
 On 5.18-rc1.
 fops_release is not called, have to manually call close(fd)
>>> Right that's weird
>> Looks it is caused by the fix patch, via mmget, which may add refcount of
>> fd.
> Yes indirectly I think: when the process mmaps the queue, mmap_region()
> takes a reference to the uacce fd. That reference is released either by
> explicit close() or munmap(), or by exit_mmap() (which is triggered by
> mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm
> dependency, so no mmget()/mmput() in bind()/unbind().
> 
> I guess we should go back to refcounted PASIDs instead, to avoid freeing
> them until unbind().

Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
nothing else simple we can do.

How does the IOMMU hardware know that all activity to a given PASID is
finished?  That activity should, today, be independent of an mm or a
fd's lifetime.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-12 Thread Dave Hansen
On 4/12/22 08:10, Jean-Philippe Brucker wrote:
>> I wonder if the Intel and ARM IOMMU code differ in the way they keep
>> references to the mm, or if this affects Intel as well, but we just
>> haven't tested the code enough.
> The Arm code was written expecting the PASID to be freed on unbind(), not
> mm exit. I missed the change of behavior, sorry (I thought your plan was
> to extend PASID lifetime, not shorten it?) but as is it seems very broken.
> For example in the iommu_sva_unbind_device(), we have
> arm_smmu_mmu_notifier_put() clearing the PASID table entry for
> "mm->pasid", which is going to end badly if the PASID has been cleared or
> reallocated. We can't clear the PASID entry in mm exit because at that
> point the device may still be issuing DMA for that PASID and we need to
> quiesce the entry rather than deactivate it.

I think we ended up flipping some of this around on the Intel side.
Instead of having to quiesce the device on mm exit, we don't let the mm
exit until the device is done.

When you program the pasid into the device, it's a lot like when you
create a thread.  We bump the reference count on the mm when we program
the page table pointer into a CPU.  We drop the thread's reference to
the mm when the thread exits and will no longer be using the page tables.

Same thing with pasids.  We bump the refcount on the mm when the pasid
is programmed into the device.  Once the device is done with the mm, we
drop the mm.

Basically, instead of recounting the pasid itself, we just refcount the mm.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-12 Thread Dave Hansen
On 4/12/22 06:41, Fenghua Yu wrote:
>> master process quit, mmput ->  mm_pasid_drop->ioasid_free
>> But this ignore driver's iommu_sva_unbind_device function,
>> iommu_sva_bind_device and iommu_sva_unbind_device are not pair,  So driver
>> does not know ioasid is freed.
>>
>> Any suggestion?
> ioasid is per process or per mm. A daemon process shouldn't share the same 
> ioasid with any other process with even its parent process. Its parent gets
> an ioasid and frees it on exit. The ioasid is gone and shouldn't be used
> by its child process.
> 
> Each daemon process should call driver -> iommu_sva_bind_device -> 
> ioasid_alloc
> to get its own ioasid/PASID. On daemon quit, the ioasid is freed.
> 
> That means nqnix needs to be changed.

Fenghua, please step back for a second and look at what you are saying.
 Your patch caused userspace to break.  Now, you're telling someone that
they need to go change that userspace to work around something that your
patch.  How, exactly, are you suggesting that nginx could change to fix
this?  What, specifically, was it doing with *fork()* that was wrong?

It sounds to me like you're saying that it's OK to break userspace.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-12 Thread Dave Hansen
On 4/12/22 00:04, zhangfei@foxmail.com wrote:
> master process quit, mmput ->  mm_pasid_drop->ioasid_free
> But this ignore driver's iommu_sva_unbind_device function,
> iommu_sva_bind_device and iommu_sva_unbind_device are not pair,  So
> driver does not know ioasid is freed.
> 
> Any suggestion?

It sounds like you're saying that the device is still abound to the
PASID even though the mm has exited and freed the PASID.  This is
essentially a use-after-free for the PASID.  Right?

The right thing to do here is to have the PASID code hold a reference on
the mm.  The mm "owns" the PASID for its entire lifetime and if anything
needs the PASID to live longer, its only recourse for doing that is via
an mmget().  I _think_ mmget() is the right thing as opposed to mmgrab()
because the PASID users actually need the page tables to be around.

This would still be nice to confirm with some traces of fork()/exit()
and the iommu_sva_{bind,unbind} and ioasid_{alloc,free} functions.

I wonder if the Intel and ARM IOMMU code differ in the way they keep
references to the mm, or if this affects Intel as well, but we just
haven't tested the code enough.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-11 Thread Dave Hansen

On 4/11/22 07:44, zhangfei@foxmail.com wrote:
> On 2022/4/11 下午10:36, Dave Hansen wrote:
>> On 4/11/22 07:20, zhangfei@foxmail.com wrote:
>>>> Is there nothing before this call trace?  Usually there will be at least
>>>> some warning text.
>>> I added dump_stack() in ioasid_free.
>> Hold on a sec, though...
>>
>> What's the *problem* here?  Did something break or are you just saying
>> that something looks weird to _you_?
> 
> After this, nginx is not working at all, and hardware reports error.
> Suppose the the master use the ioasid for init, but got freed.
> 
> hardware reports:
> [  152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error 
> status=0x20] found
> [  152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error 
> status=0x40] found
> [  152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error status=0x20] 
> found
> [  152.755340] hisi_sec2 :76:00.0: Controller resetting...
> [  152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout!
> [  152.768198] hisi_sec2 :76:00.0: Failed to dump sqc!
> [  152.773490] hisi_sec2 :76:00.0: Failed to drain out data for stopping!
> [  152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start!
> [  152.787468] hisi_sec2 :76:00.0: Failed to dump sqc!
> [  152.792753] hisi_sec2 :76:00.0: Failed to drain out data for stopping!
> [  152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start!
> [  152.806730] hisi_sec2 :76:00.0: Failed to dump sqc!
> [  152.812017] hisi_sec2 :76:00.0: Failed to drain out data for stopping!
> [  152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start!
> [  152.825992] hisi_sec2 :76:00.0: Failed to dump sqc!

That would have been awfully handy information to have in an initial bug 
report. :)
Is there a chance you could dump out that ioasid alloc *and* free information 
in ioasid_alloc/free()?  This could be some kind of problem with the allocator, 
or with copying the ioasid at fork.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-11 Thread Dave Hansen
On 4/11/22 07:20, zhangfei@foxmail.com wrote:
>> Is there nothing before this call trace?  Usually there will be at least
>> some warning text.
> I added dump_stack() in ioasid_free.

Hold on a sec, though...

What's the *problem* here?  Did something break or are you just saying
that something looks weird to _you_?

For instance, if we have:

nginx: ioasid_alloc()==1
   fork(); // spawn the daemon
   exit();
   ioasid_free(1);

and then a moment later:

lynx:  ioasid_alloc()==1
   fork();
   exit();
   ioasid_free(1);

There's no problem.  The original nginx process with ioasid==1 exited.
The fact that *some* process called nginx is still running doesn't mean
that it still has a reference to asid==1.

Are you sure the nginx process that allocated the ASID is the same
process you see in ps?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-11 Thread Dave Hansen
On 4/11/22 07:00, Zhangfei Gao wrote:
> with this patchset, each time after  sbin/nginx, ioasid is freed
> immediately. lynx test will alloc the same ioasid=1.

That doesn't seem right.  Isn't 'sbin/nginx' still running when lynx
runs?  How can they get the same ioasid?

This sounds like a refcounting problem, like that the ioasid wasn't
properly refcounted as nginx forked into the background.

> To verify, hack comment mm_pasid_drop in __mmput will make the issue
> disappear.
> 
> log: after sbin/nginx.
> [   96.526730] Call trace:
> [   96.526732]  dump_backtrace+0xe4/0xf0
> [   96.526741]  show_stack+0x20/0x70
> [   96.526744]  dump_stack_lvl+0x8c/0xb8
> [   96.526751]  dump_stack+0x18/0x34
> [   96.526754]  ioasid_free+0xdc/0xfc
> [   96.526757]  mmput+0x138/0x160
> [   96.526760]  do_exit+0x284/0x9d0
> [   96.526765]  do_group_exit+0x3c/0xa8
> [   96.526767]  __wake_up_parent+0x0/0x38
> [   96.526770]  invoke_syscall+0x4c/0x110
> [   96.526775]  el0_svc_common.constprop.0+0x68/0x128
> [   96.526778]  do_el0_svc+0x2c/0x90
> [   96.526781]  el0_svc+0x30/0x98
> [   96.526783]  el0t_64_sync_handler+0xb0/0xb8
> [   96.526785]  el0t_64_sync+0x18c/0x190

Is there nothing before this call trace?  Usually there will be at least
some warning text.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V3 1/2] Documentation: x86: Add documentation for AMD IOMMU

2022-03-30 Thread Dave Hansen
On 3/30/22 07:01, Deucher, Alexander wrote:
>> Just scanning this, it looks *awfully* generic.  Is anything in
>> here AMD- specific?  Should this be in an AMD-specific file?
> There is some information about the ACPI tables used to enumerate the
> IOMMUs and a link to the AMD IOMMU programming documentation.  Would
> you prefer I just create a combined x86 IOMMU document?

Yeah, I think that would make a lot of sense.  Let's just have one
document with an AMD section and an Intel section.  Maybe just rename
the existing one to intel-iommu => x86-iommu.rst.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 1/2] Documentation: x86: Add documentation for AMD IOMMU

2022-03-29 Thread Dave Hansen
On 3/28/22 10:28, Alex Deucher wrote:
> +How is IOVA generated?
> +--
> +
> +Well behaved drivers call dma_map_*() calls before sending command to device
> +that needs to perform DMA. Once DMA is completed and mapping is no longer
> +required, driver performs dma_unmap_*() calls to unmap the region.
> +
> +Fault reporting
> +---
> +
> +When errors are reported, the IOMMU signals via an interrupt. The fault
> +reason and device that caused it is printed on the console.

Just scanning this, it looks *awfully* generic.  Is anything in here
AMD-specific?  Should this be in an AMD-specific file?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] Documentation: x86: Clarify Intel IOMMU documenation

2022-03-09 Thread Dave Hansen
> -ACPI enumerates and lists the different DMA engines in the platform, and
> -device scope relationships between PCI devices and which DMA engine  controls
> +ACPI enumerates and lists the different IOMMUs in the platform, and
> +device scope relationships between PCI devices and which IOMMU controls
>  them.

Isn't this just a really long-winded way of saying:

ACPI enumerates both the IOMMUs in the platform and which IOMMU
controls a specific PCI device.

?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 00/12] x86: Trenchboot secure dynamic launch Linux kernel support

2022-02-23 Thread Dave Hansen
On 2/16/22 19:54, Ross Philipson wrote:
> The larger focus of the TrenchBoot project (https://github.com/TrenchBoot) is 
> to
> enhance the boot security and integrity in a unified manner. The first area of
> focus has been on the Trusted Computing Group's Dynamic Launch for 
> establishing
> a hardware Root of Trust for Measurement, also know as DRTM (Dynamic Root of
> Trust for Measurement). The project has been and continues to work on 
> providing
> a unified means to Dynamic Launch that is a cross-platform (Intel and AMD) and
> cross-architecture (x86 and Arm), with our recent involvment in the upcoming
> Arm DRTM specification. The order of introducing DRTM to the Linux kernel
> follows the maturity of DRTM in the architectures. Intel's Trusted eXecution
> Technology (TXT) is present today and only requires a preamble loader, e.g. a
> boot loader, and an OS kernel that is TXT-aware. AMD DRTM implementation has
> been present since the introduction of AMD-V but requires an additional
> component that is AMD specific and referred to in the specification as the
> Secure Loader, which the TrenchBoot project has an active prototype in
> development. Finally Arm's implementation is in specification development 
> stage
> and the project is looking to support it when it becomes available.

What problem is this patch series solving?  Is the same problem solved
in a different way in the kernel today?  What is wrong with that
solution?  What effects will end users see if they apply this series?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-11 Thread Dave Hansen
On 2/7/22 15:02, Fenghua Yu wrote:
...
> Get rid of the refcounting mechanisms and replace/rename the interfaces
> to reflect this new approach.
...
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +--
>  drivers/iommu/intel/iommu.c   |  4 +-
>  drivers/iommu/intel/svm.c |  9 -
>  drivers/iommu/ioasid.c| 39 ++-
>  drivers/iommu/iommu-sva-lib.c | 39 ++-
>  drivers/iommu/iommu-sva-lib.h |  1 -
>  include/linux/ioasid.h| 12 +-
>  include/linux/sched/mm.h  | 16 
>  kernel/fork.c |  1 +
>  9 files changed, 38 insertions(+), 88 deletions(-)

Given the heavily non-x86 diffstat here, I was hoping to see some acks
from folks that this might affect, especially on the ARM side.

Is everyone OK with this?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-14 Thread Dave Hansen
On 12/14/21 2:23 PM, Tom Lendacky wrote:
>> I don't really understand how this can be more general any *not* get
>> utilized by the existing SEV support.
> 
> The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is
> meant to be used with a (relatively) un-enlightened guest. The idea is
> that the C-bit in the guest page tables must be 0 for all accesses. It
> is only the physical address relative to VTOM that determines if the
> access is encrypted or not. So setting sme_me_mask will actually cause
> issues when running with this feature. Since all DMA for an SEV-SNP
> guest must still be to shared (unencrypted) memory, some enlightenment
> is needed. In this case, memory mapped above VTOM will provide that via
> the SWIOTLB update. For SEV-SNP guests running with VTOM, they are
> likely to also be running with the Reflect #VC feature, allowing a
> "paravisor" to handle any #VCs generated by the guest.
> 
> See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC"
> in volume 2 of the AMD APM [1].

Thanks, Tom, that's pretty much what I was looking for.

The C-bit normally comes from the page tables.  But, the hardware also
provides an alternative way to effectively get C-bit behavior without
actually setting the bit in the page tables: Virtual Top-of-Memory
(VTOM).  Right?

It sounds like Hyper-V has chosen to use VTOM instead of requiring the
guest to do the C-bit in its page tables.

But, the thing that confuses me is when you said: "it (VTOM) is meant to
be used with a (relatively) un-enlightened guest".  We don't have an
unenlightened guest here.  We have Linux, which is quite enlightened.

Is VTOM being used because there's something that completely rules out
using the C-bit in the page tables?  What's that "something"?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-14 Thread Dave Hansen
On 12/13/21 8:36 PM, Tianyu Lan wrote:
> On 12/14/2021 12:45 AM, Dave Hansen wrote:
>> On 12/12/21 11:14 PM, Tianyu Lan wrote:
>>> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
>>> extra address space which is above shared_gpa_boundary (E.G 39 bit
>>> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
>>> physical address will be original physical address +
>>> shared_gpa_boundary.
>>> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
>>> memory(vTOM). Memory addresses below vTOM are automatically treated as
>>> private while memory above vTOM is treated as shared.
>>
>> This seems to be independently reintroducing some of the SEV
>> infrastructure.  Is it really OK that this doesn't interact at all with
>> any existing SEV code?
>>
>> For instance, do we need a new 'swiotlb_unencrypted_base', or should
>> this just be using sme_me_mask somehow?
> 
>    Thanks for your review. Hyper-V provides a para-virtualized
> confidential computing solution based on the AMD SEV function and not
> expose sev capabilities to guest. So sme_me_mask is unset in the
> Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution
> to handle such case of different address space for encrypted and
> decrypted memory and other platform also may reuse it.

I don't really understand how this can be more general any *not* get
utilized by the existing SEV support.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-13 Thread Dave Hansen
On 12/12/21 11:14 PM, Tianyu Lan wrote:
> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
> extra address space which is above shared_gpa_boundary (E.G 39 bit
> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
> physical address will be original physical address + shared_gpa_boundary.
> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
> memory(vTOM). Memory addresses below vTOM are automatically treated as
> private while memory above vTOM is treated as shared.

This seems to be independently reintroducing some of the SEV
infrastructure.  Is it really OK that this doesn't interact at all with
any existing SEV code?

For instance, do we need a new 'swiotlb_unencrypted_base', or should
this just be using sme_me_mask somehow?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Dave Hansen
On 9/28/21 1:28 PM, Luck, Tony wrote:
> On Tue, Sep 28, 2021 at 12:19:22PM -0700, Dave Hansen wrote:
>> On 9/28/21 11:50 AM, Luck, Tony wrote:
>>> On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote:
>> ...
>>>> 1. Hide whether we need to write to real registers
>>>> 2. Hide whether we need to update the in-memory image
>>>> 3. Hide other FPU infrastructure like the TIF flag.
>>>> 4. Make the users deal with a *whole* state in the replace API
>>>
>>> Is that difference just whether you need to save the
>>> state from registers to memory (for the "update" case)
>>> or not (for the "replace" case ... where you can ignore
>>> the current register, overwrite the whole per-feature
>>> xsave area and mark it to be restored to registers).
>>>
>>> If so, just a "bool full" argument might do the trick?
>>
>> I want to be able to hide the complexity of where the old state comes
>> from.  It might be in registers or it might be in memory or it might be
>> *neither*.  It's possible we're running with stale register state and a
>> current->...->xsave buffer that has XFEATURES_FOO 0.
>>
>> In that case, the "old" copy might be memcpy'd out of the init_task.
>> Or, for pkeys, we might build it ourselves with init_pkru_val.
> 
> So should there be an error case if there isn't an "old" state, and
> the user calls:
> 
>   p = begin_update_one_xsave_feature(XFEATURE_something, false);
> 
> Maybe instead of an error, just fill it in with the init state for the 
> feature?

Yes, please.  Let's not generate an error.  Let's populate the init
state and let them move on with life.

>>> pseudo-code:
>>>
>>> void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full)
>>> {
>>> void *addr;
>>>
>>> BUG_ON(!(xsave->header.xcomp_bv & xfeature));
>>>
>>> addr = __raw_xsave_addr(xsave, xfeature);
>>>
>>> fpregs_lock();
>>>
>>> if (full)
>>> return addr;
>>
>> If the feature is marked as in the init state in the buffer
>> (XSTATE_BV[feature]==0), this addr *could* contain total garbage.  So,
>> we'd want to make sure that the memory contents have the init state
>> written before handing them back to the caller.  That's not strictly
>> required if the user is writing the whole thing, but it's the nice thing
>> to do.
> 
> Nice guys waste CPU cycles writing to memory that is just going to get
> written again.

Let's skip the "replace" operation for now and focus on "update".  A
full replace *can* be faster because it doesn't require the state to be
written out.  But, we don't need that for now.

Let's just do the "update" thing, and let's ensure that we reflect the
init state into the buffer that gets returned if the feature was in its
init state.

Sound good?

>>> if (xfeature registers are "live")
>>> xsaves(xstate, 1 << xfeature);
>>
>> One little note: I don't think we would necessarily need to do an XSAVES
>> here.  For PKRU, for instance, we could just do a rdpkru.
> 
> Like this?
> 
>   if (tsk == current) {
>   switch (xfeature) {
>   case XFEATURE_PKRU:
>   *(u32 *)addr = rdpkru();
>   break;
>   case XFEATURE_PASID:
>   rdmsrl(MSR_IA32_PASID, msr);
>   *(u64 *)addr = msr;
>   break;
>   ... any other "easy" states ...
>   default:
>   xsaves(xstate, 1 << xfeature);
>   break;
>   }
>   }

Yep.

>>> return addr;
>>> }
>>>
>>> void finish_update_one_xsave_feature(enum xfeature xfeature)
>>> {
>>> mark feature modified
>>
>> I think we'd want to do this at the "begin" time.  Also, do you mean we
>> should set XSTATE_BV[feature]?
> 
> Begin? End? It's all inside fpregs_lock(). But whatever seems best.

I'm actually waffling about it.

Does XSTATE_BV[feature] mean that state *is* there or that state *may*
be there?  Either way works.

>> It's also worth noting that we *could*:
>>
>>  xrstors(xstate, 1<>
>> as well.  That would bring the registers back up to day and we could
>> keep TIF_NEED_FPU_LOAD==0.
> 
> Only makes sense if "tsk == current". But does this help. The work seems
> to be the same whether we do it now, or later. We don't know for sure
> that we will directly return to the task. We might context switch to
> another task, so loading the state into registers now would just be
> wasted time.

True, but the other side of the coin is that setting TIF_NEED_FPU_LOAD
subjects us to an XRSTOR on the way out to userspace.  That XRSTOR might
just be updating one state component in practice.

Either way, sorry for the distraction.  We (me) should really be
focusing on getting something that is functional but slow.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Dave Hansen
On 9/28/21 11:50 AM, Luck, Tony wrote:
> On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote:
...
>> 1. Hide whether we need to write to real registers
>> 2. Hide whether we need to update the in-memory image
>> 3. Hide other FPU infrastructure like the TIF flag.
>> 4. Make the users deal with a *whole* state in the replace API
> 
> Is that difference just whether you need to save the
> state from registers to memory (for the "update" case)
> or not (for the "replace" case ... where you can ignore
> the current register, overwrite the whole per-feature
> xsave area and mark it to be restored to registers).
> 
> If so, just a "bool full" argument might do the trick?

I want to be able to hide the complexity of where the old state comes
from.  It might be in registers or it might be in memory or it might be
*neither*.  It's possible we're running with stale register state and a
current->...->xsave buffer that has XFEATURES_FOO 0.

In that case, the "old" copy might be memcpy'd out of the init_task.
Or, for pkeys, we might build it ourselves with init_pkru_val.

> Also - you have a "tsk" argument in your pseudo code. Is
> this needed? Are there places where we need to perform
> these operations on something other than "current"?

Two cases come to mind:
1. Fork/clone where we are doing things to our child's XSAVE buffer
2. ptrace() where we are poking into another task's state

ptrace() goes for the *whole* buffer now.  I'm not sure it would need
this per-feature API.  I just call it out as something that we might
need in the future.

> pseudo-code:
> 
> void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full)
> {
>   void *addr;
> 
>   BUG_ON(!(xsave->header.xcomp_bv & xfeature));
> 
>   addr = __raw_xsave_addr(xsave, xfeature);
> 
>   fpregs_lock();
> 
>   if (full)
>   return addr;

If the feature is marked as in the init state in the buffer
(XSTATE_BV[feature]==0), this addr *could* contain total garbage.  So,
we'd want to make sure that the memory contents have the init state
written before handing them back to the caller.  That's not strictly
required if the user is writing the whole thing, but it's the nice thing
to do.

>   if (xfeature registers are "live")
>   xsaves(xstate, 1 << xfeature);

One little note: I don't think we would necessarily need to do an XSAVES
here.  For PKRU, for instance, we could just do a rdpkru.

>   return addr;
> }
> 
> void finish_update_one_xsave_feature(enum xfeature xfeature)
> {
>   mark feature modified

I think we'd want to do this at the "begin" time.  Also, do you mean we
should set XSTATE_BV[feature]?

>   set TIF bit

Since the XSAVE buffer was updated, it now contains the canonical FPU
state.  It may have diverged from the register state, thus we need to
set TIF_NEED_FPU_LOAD.

It's also worth noting that we *could*:

xrstors(xstate, 1<   fpregs_unlock();
> }
> 
> -Tony
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-27 Thread Dave Hansen
On 9/27/21 2:02 PM, Luck, Tony wrote:
> Or are you thinking of a helper that does both the check
> and the update ... so the code here could be:
> 
>   update_one_xsave_feature(XFEATURE_PASID, _val, sizeof(msr_val));
> 
> With the helper being something like:
> 
> void update_one_xsave_feature(enum xfeature xfeature, void *data, size_t size)
> {
>   if (xsave_state_in_memory(args ...)) {
>   addr = get_xsave_addr(xsave, xfeature);
>   memcpy(addr, data, size);
>   xsave->header.xfeatures |= (1 << xfeature);
>   return;
>   }
> 
>   switch (xfeature) {
>   case XFEATURE_PASID:
>   wrmsrl(MSR_IA32_PASID, *(u64 *)data);
>   break;
> 
>   case each_of_the_other_XFEATURE_enums:
>   code to update registers for that XFEATURE
>   }
> }
> 
> either way needs the definitive correct coding for xsave_state_in_memory()

That's close to what we want.

The size should probably be implicit.  If it isn't implicit, it needs to
at least be double-checked against the state sizes.

Not to get too fancy, but I think we also want to have a "replace"
operation which is separate from the "update".  Think of a case where
you are trying to set a bit:

struct pkru_state *pk = start_update_xstate(tsk, XSTATE_PKRU);
pk->pkru |= 0x100;
finish_update_xstate(tsk, XSTATE_PKRU, pk);

versus setting a whole new value:

struct pkru_state *pk = start_replace_xstate(tsk, XSTATE_PKRU);
memset(pkru, sizeof(*pk), 0);
pk->pkru = 0x1234;
finish_replace_xstate(tsk, XSTATE_PKRU, pk);

They look similar, but they actually might have very different costs for
big states like AMX.  We can also do some very different debugging for
them.  In normal operation, these _can_ just return pointers directly to
the fpu...->xstate in some case.  But, if we're debugging, we could
kmalloc() a buffer and do sanity checking before it goes into the task
buffer.

We don't have to do any of that fancy stuff now.  We don't even need to
do an "update" if all we need for now for XFEATURE_PASID is a "replace".

1. Hide whether we need to write to real registers
2. Hide whether we need to update the in-memory image
3. Hide other FPU infrastructure like the TIF flag.
4. Make the users deal with a *whole* state in the replace API
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-22 Thread Dave Hansen
On 9/22/21 2:11 PM, Peter Zijlstra wrote:
>>> +static bool fixup_pasid_exception(void)
>>> +{
>>> +   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
>>> +   return false;
>>> +
>>> +   return __fixup_pasid_exception();
>>> +}
> That is, shouldn't the above at the very least decode the instruction
> causing the #GP and check it's this ENQCMD thing?

To reiterate: on systems with no X86_FEATURE_ENQCMD, there is basically
no additional overhead.  It isn't worth doing decoding there.

On systems with X86_FEATURE_ENQCMD, but where it is unused, the #GP
handler gets some new overhead on every #GP.  Basically:

> + pasid = current->mm->pasid;
> + if (pasid == PASID_DISABLED)
> + return false;

That's still pretty cheap.  Probably not worth doing decoding there either.

So, that leaves us with if you are:
1. On system with X86_FEATURE_ENQCMD
2. In a process/mm that has an allocated pasid
3. Your *task* does not have the MSR set
4. You get a #GP for some other reason

Then, you'll do this double-#GP dance.

So, instruction decoding could absolutely be added between steps 3 and
4.  It would absolutely save doing the double-#GP in cases where 1/2/3
are met.  But, I wouldn't move it up above and of the 1/2/3 checks
because they're way cheaper than instruction decoding.

In the end, it didn't seem worth it to me to be optimizing a relatively
rare path which 99% of the time ends up in a crash.

If you want instruction decoding in here, though, just say the word. :)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-09 Thread Dave Hansen
On 8/9/21 10:56 AM, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.

>From an x86/mm perspective:

Acked-by: Dave Hansen 

A tiny nit:

> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>  
>  bool hv_is_isolation_supported(void)
>  {
> + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> + return 0;
> +
> + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> + return 0;
> +
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
This might be worthwhile to move to a header.  That ensures that
hv_is_isolation_supported() use can avoid even a function call.  But, I
see this is used in modules and its use here is also in a slow path, so
it's not a big deal
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 03/14] x86/set_memory: Add x86_set_memory_enc static call support

2021-08-05 Thread Dave Hansen
On 8/5/21 7:23 AM, Peter Zijlstra wrote:
> This is assuming any of this is actually performance critical, based off
> of this using static_call() to begin with.

This code is not performance critical.

I think I sent folks off on a wild goose chase when I asked that we make
an effort to optimize code that does:

if (some_hyperv_check())
foo();

if (some_amd_feature_check())
bar();

with checks that will actually compile away when Hyper-V or
some_amd_feature() is disabled.  That's less about performance and just
about good hygiene.  I *wanted* to see
cpu_feature_enabled(X86_FEATURE...) checks.

Someone suggested using static calls, and off we went...

Could we please just use cpu_feature_enabled()?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 03/14] x86/set_memory: Add x86_set_memory_enc static call support

2021-08-04 Thread Dave Hansen
On 8/4/21 11:44 AM, Tianyu Lan wrote:
> +static int default_set_memory_enc(unsigned long addr, int numpages, bool 
> enc);
> +DEFINE_STATIC_CALL(x86_set_memory_enc, default_set_memory_enc);
> +
>  #define CPA_FLUSHTLB 1
>  #define CPA_ARRAY 2
>  #define CPA_PAGES_ARRAY 4
> @@ -1981,6 +1985,11 @@ int set_memory_global(unsigned long addr, int numpages)
>  }
>  
>  static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +{
> + return static_call(x86_set_memory_enc)(addr, numpages, enc);
> +}
> +
> +static int default_set_memory_enc(unsigned long addr, int numpages, bool enc)
>  {
>   struct cpa_data cpa;
>   int ret;

It doesn't make a lot of difference to add this infrastructure and then
ignore it for the existing in-tree user:

> static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> {
> struct cpa_data cpa;
> int ret;
> 
> /* Nothing to do if memory encryption is not active */
> if (!mem_encrypt_active())
> return 0;

Shouldn't the default be to just "return 0"?  Then on
mem_encrypt_active() systems, do the bulk of what is in
__set_memory_enc_dec() today.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-07-29 Thread Dave Hansen
On 7/29/21 8:02 AM, Tianyu Lan wrote:
>>
> 
> There is x86_hyper_type to identify hypervisor type and we may check
> this variable after checking X86_FEATURE_HYPERVISOR.
> 
> static inline bool hv_is_isolation_supported(void)
> {
> if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
>     return 0;
> 
>     if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>     return 0;
> 
> // out of line function call:
> return __hv_is_isolation_supported();
> }   

Looks fine.  You just might want to use this existing helper:

static inline bool hypervisor_is_type(enum x86_hypervisor_type type)
{
return x86_hyper_type == type;
}

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-07-29 Thread Dave Hansen
On 7/29/21 6:01 AM, Tianyu Lan wrote:
> On 7/29/2021 1:06 AM, Dave Hansen wrote:
>> On 7/28/21 7:52 AM, Tianyu Lan wrote:
>>> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long
>>> addr, int numpages, bool enc)
>>>   int ret;
>>>     /* Nothing to do if memory encryption is not active */
>>> -    if (!mem_encrypt_active())
>>> +    if (hv_is_isolation_supported())
>>> +    return hv_set_mem_enc(addr, numpages, enc);
>>> +    else if (!mem_encrypt_active())
>>>   return 0;
>>
>> One more thing.  If you're going to be patching generic code, please
>> start using feature checks that can get optimized away at runtime.
>> hv_is_isolation_supported() doesn't look like the world's cheapest
>> check.  It can't be inlined and costs at least a function call.
> 
> Yes, you are right. How about adding a static branch key for the check
> of isolation VM? This may reduce the check cost.

I don't think you need a static key.

There are basically three choices:
1. Use an existing X86_FEATURE bit.  I think there's already one for
   when you are running under a hypervisor.  It's not super precise,
   but it's better than what you have.
2. Define a new X86_FEATURE bit for when you are running under
   Hyper-V.
3. Define a new X86_FEATURE bit specifically for Hyper-V isolation VM
   support.  This particular feature might be a little uncommon to
   deserve its own bit.

I'd probably just do #2.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-07-28 Thread Dave Hansen
On 7/28/21 7:52 AM, Tianyu Lan wrote:
> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int 
> numpages, bool enc)
>   int ret;
>  
>   /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> + if (hv_is_isolation_supported())
> + return hv_set_mem_enc(addr, numpages, enc);
> + else if (!mem_encrypt_active())
>   return 0;

One more thing.  If you're going to be patching generic code, please
start using feature checks that can get optimized away at runtime.
hv_is_isolation_supported() doesn't look like the world's cheapest
check.  It can't be inlined and costs at least a function call.

These checks could, with basically no effort be wrapped in a header like
this:

static inline bool hv_is_isolation_supported(void)
{
if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
return 0;

// out of line function call:
return __hv_is_isolation_supported();
}   

I don't think it would be the end of the world to add an
X86_FEATURE_HYPERV_GUEST, either.  There are plenty of bits allocated
for Xen and VMWare.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-07-28 Thread Dave Hansen
On 7/28/21 7:52 AM, Tianyu Lan wrote:
> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int 
> numpages, bool enc)
>   int ret;
>  
>   /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> + if (hv_is_isolation_supported())
> + return hv_set_mem_enc(addr, numpages, enc);
> + else if (!mem_encrypt_active())
>   return 0;

__set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now
Hyper-V are messing around in here.

It doesn't help that these additions are totally uncommented.  Even
worse is that hv_set_mem_enc() was intentionally named "enc" when it
presumably has nothing to do with encryption.

This needs to be refactored.  The current __set_memory_enc_dec() can
become __set_memory_enc_pgtable().  It gets used for the hypervisors
that get informed about "encryption" status via page tables: SEV and TDX.

Then, rename hv_set_mem_enc() to hv_set_visible_hcall().  You'll end up
with:

int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
if (hv_is_isolation_supported())
return hv_set_visible_hcall(...);

if (mem_encrypt_active() || ...)
return __set_memory_enc_pgtable();

/* Nothing to do */
return 0;
}

That tells the story pretty effectively, in code.

> +int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
> +{
> + return hv_set_mem_host_visibility((void *)addr,
> + numpages * HV_HYP_PAGE_SIZE,
> + enc ? VMBUS_PAGE_NOT_VISIBLE
> + : VMBUS_PAGE_VISIBLE_READ_WRITE);
> +}

I know this is off in Hyper-V code, but this just makes my eyes bleed.
I'd much rather see something which is less compact but readable.

> +/* Hyper-V GPA map flags */
> +#define  VMBUS_PAGE_NOT_VISIBLE  0
> +#define  VMBUS_PAGE_VISIBLE_READ_ONLY1
> +#define  VMBUS_PAGE_VISIBLE_READ_WRITE   3

That looks suspiciously like an enum.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V4 13/13] x86/HV: Not set memory decrypted/encrypted during kexec alloc/free page in IVM

2021-07-07 Thread Dave Hansen
On 7/7/21 8:46 AM, Tianyu Lan wrote:
> @@ -598,7 +599,7 @@ void arch_kexec_unprotect_crashkres(void)
>   */
>  int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
>  {
> - if (sev_active())
> + if (sev_active() || hv_is_isolation_supported())
>   return 0;
>  
>   /*
> @@ -611,7 +612,7 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int 
> pages, gfp_t gfp)
>  
>  void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>  {
> - if (sev_active())
> + if (sev_active() || hv_is_isolation_supported())
>   return;

You might want to take a look through the "protected guest" patches.  I
think this series is touching a few of the same locations that TDX and
recent SEV work touch.

https://lore.kernel.org/lkml/20210618225755.662725-5-sathyanarayanan.kuppusw...@linux.intel.com/

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID

2020-08-03 Thread Dave Hansen
On 8/3/20 10:16 AM, Andy Lutomirski wrote:
> - TILE: genuinely per-thread, but it's expensive so it's
> lazy-loadable.  But the lazy-load mechanism reuses #NM, and it's not
> fully disambiguated from the other use of #NM.  So it sort of works,
> but it's gross.

For those playing along at home, there's a new whitepaper out from Intel
about some new CPU features which are going to be fun:

> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Which part were you worried about?  I thought it was fully disambuguated
from this:

> When XFD causes an instruction to generate #NM, the processor loads
> the IA32_XFD_ERR MSR to identify the disabled state component(s).
> Specifically, the MSR is loaded with the logical AND of the IA32_XFD
> MSR and the bitmap corresponding to the state components required by
> the faulting instruction.
> 
> Device-not-available exceptions that are not due to XFD — those 
> resulting from setting CR0.TS to 1 — do not modify the IA32_XFD_ERR
> MSR.

So if you always make sure to *clear* IA32_XFD_ERR after handing and XFD
exception, any #NM's with a clear IA32_XFD_ERR are from "legacy"
CR0.TS=1.  Any bits set in IA32_XFD_ERR mean a new-style XFD exception.

Am I missing something?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID

2020-08-03 Thread Dave Hansen
On 8/3/20 8:12 AM, Andy Lutomirski wrote:
> I could easily be convinced that the PASID fixup is so trivial and so
> obviously free of misfiring in a way that causes an infinite loop that
> this code is fine.  But I think we first need to answer the bigger
> question of why we're doing a lazy fixup in the first place.

There was an (internal to Intel) implementation of this about a year ago
that used smp_call_function_many() to force the MSR state into all
threads of a process.  I took one look at it, decided there was a 0%
chance of it actually functioning and recommended we find another way.
While I'm sure it could be done more efficiently, the implementation I
looked at took ~200 lines of code and comments.  It started to look too
much like another instance of mm->cpumask for my comfort.

The only other option I could think of would be an ABI where threads
were required to call into the kernel at least once after creation
before calling ENQCMD.  All ENQCMDs would be required to be "wrapped" by
code doing this syscall.  Something like this:

if (!thread_local(did_pasid_init))
sys_pasid_init(); // new syscall or prctl

thread_local(did_pasid_init) = 1;

// ENQCMD here
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID

2020-08-03 Thread Dave Hansen
On 7/31/20 4:34 PM, Andy Lutomirski wrote:
>> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
>> without a valid PASID value programmed. #GP error codes are 16 bits and all
>> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
>> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
>> when loading from the source operand, so its not fully comprehending all
>> the reasons. Rather than special case the ENQCMD, in future Intel may
>> choose a different fault mechanism for such cases if recovery is needed on
>> #GP.
> Decoding the user instruction is ugly and sets a bad architecture
> precedent, but we already do it in #GP for UMIP.  So I'm unconvinced.

I'll try to do one more bit of convincing. :)

In the end, we need a way to figure out if the #GP was from a known "OK"
source that we can fix up.  You're right that we could fire up the
instruction decoder to help answer that question.  But, it (also)
doesn't easily yield a perfect answer as to the source of the #GP, it
always involves a user copy, and it's a larger code impact than what
we've got.

I think I went and looked at fixup_umip_exception(), and compared it to
the alternative which is essentially just these three lines of code:

> + /*
> +  * If the current task already has a valid PASID in the MSR,
> +  * the #GP must be for some other reason.
> +  */
> + if (current->has_valid_pasid)
> + return false;
...> +  /* Now the current task has a valid PASID in the MSR. */
> + current->has_valid_pasid = 1;

and *I* was convinced that instruction decoding wasn't worth it.

There's a lot of stuff that fixup_umip_exception() does which we don't
have to duplicate, but it's going to be really hard to get it anywhere
near as compact as what we've got.

I guess Fenghua could go give instruction decoding a shot and see how it
looks, though.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 12/12] x86/traps: Fix up invalid PASID

2020-06-26 Thread Dave Hansen
On 6/26/20 11:10 AM, Luck, Tony wrote:
> On Fri, Jun 26, 2020 at 11:44:50AM +0200, Peter Zijlstra wrote:
>> On Thu, Jun 25, 2020 at 01:17:22PM -0700, Fenghua Yu wrote:
>>
>>> +static bool fixup_pasid_exception(void)
>>> +{
>>> +   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
>>> +   return false;
>>> +   if (!static_cpu_has(X86_FEATURE_ENQCMD))
>>> +   return false;
>>
>> elsewhere you had another variation:
>>
>> +   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
>> +   return;
>> +
>> +   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
>> +   return;
>>
>> Which is it, and why do we need the CONFIG thing when combined with the
>> enabled thing?
> 
> Do we have a standard way of coding for a feature that depends on multiple
> other features?  For this case the system needs to both suport the ENQCMD
> instruction, and also have kernel code that programs the IOMMU.

Not really a standard one.

We could setup_clear_cpu_cap(X86_FEATURE_ENQCMD) during boot if we see
that CONFIG_INTEL_IOMMU_SVM=n or if we don't have a detected IOMMU.
That would at least get static value patching which would make some of
the feature checks very cheap.

That means we can't use ENQCMD at all in the kernel, though.  Is that an
issue?  Is the CPU feature truly dependent on IOMMU_SVM?

> And/or guidance on when to use each of the very somewhat simlar looking
>   boot_cpu_has()
>   static_cpu_has()
>   IS_ENABLED()
>   cpu_feature_enabled(X86_FEATURE_ENQCMD)
> options?

cpu_feature_enabled() is by go-to for checking X86_FEATUREs.  It has
compile-time checking along with static checking.

If you use cpu_feature_enabled(), and we added:

#ifdef CONFIG_INTEL_IOMMU_SVM
# define DISABLE_ENQCMD 0
#else
# define DISABLE_ENQCMD   (1 << (X86_FEATURE_ENQCMD & ))
#endif

to arch/x86/include/asm/disabled-features.h, then we could check only
X86_FEATURE_ENQCMD, and we'd get that IS_ENABLED() check for "free".
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Dave Hansen
On 3/17/20 2:06 PM, Borislav Petkov wrote:
> On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote:
>> On 3/17/20 4:18 AM, Borislav Petkov wrote:
>>> Back then when the whole SME machinery started getting mainlined, it
>>> was agreed that for simplicity, clarity and sanity's sake, the terms
>>> denoting encrypted and not-encrypted memory should be "encrypted" and
>>> "decrypted". And the majority of the code sticks to that convention
>>> except those two. So rename them.
>> Don't "unencrypted" and "decrypted" mean different things?
>>
>> Unencrypted to me means "encryption was never used for this data".
>>
>> Decrypted means "this was/is encrypted but here is a plaintext copy".
> Maybe but linguistical semantics is not the point here.
> 
> The idea is to represent a "binary" concept of memory being encrypted
> or memory being not encrypted. And at the time we decided to use
> "encrypted" and "decrypted" for those two things.

Yeah, agreed.  We're basically trying to name "!encrypted".

> Do you see the need to differentiate a third "state", so to speak, of
> memory which was never encrypted?

No, there are just two states.  I just think the "!encrypted" case
should not be called "decrypted".
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Dave Hansen
On 3/17/20 4:18 AM, Borislav Petkov wrote:
> Back then when the whole SME machinery started getting mainlined, it
> was agreed that for simplicity, clarity and sanity's sake, the terms
> denoting encrypted and not-encrypted memory should be "encrypted" and
> "decrypted". And the majority of the code sticks to that convention
> except those two. So rename them.

Don't "unencrypted" and "decrypted" mean different things?

Unencrypted to me means "encryption was never used for this data".

Decrypted means "this was/is encrypted but here is a plaintext copy".

This, for instance:

> +++ b/kernel/dma/direct.c
> @@ -26,7 +26,7 @@ unsigned int zone_dma_bits __ro_after_init = 24;
>  static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>   phys_addr_t phys)
>  {
> - if (force_dma_unencrypted(dev))
> + if (force_dma_decrypted(dev))
>   return __phys_to_dma(dev, phys);

is referring to DMA that is not and never was encrypted.  It's skipping
the encryption altogether.  There's no act of "decryption" anywhere.

This, on the other hand, seems named wrong to me:

> /*
>  * Macros to add or remove encryption attribute
>  */
> #define pgprot_encrypted(prot)  __pgprot(__sme_set(pgprot_val(prot)))
> #define pgprot_decrypted(prot)  __pgprot(__sme_clr(pgprot_val(prot)))

This seems like it would be better named pgprot_unencrypted().
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)

2019-04-05 Thread Dave Hansen
On 4/5/19 8:24 AM, Andy Lutomirski wrote:
>> Sounds like we need a mechanism that will do the deferred XPFO TLB 
>> flushes whenever the kernel is entered, and not _just_ at context
>> switch time.  This permits an app to run in userspace with stale
>> kernel TLB entries as long as it wants... that's harmless.
...
> I suppose we could do the flush at context switch *and*
> entry.  I bet that performance still utterly sucks, though — on many
> workloads, this turns every entry into a full flush, and we already
> know exactly how much that sucks — it’s identical to KPTI without
> PCID.  (And yes, if we go this route, we need to merge this logic
> together — we shouldn’t write CR3 twice on entry).

Yeah, probably true.

Just eyeballing this, it would mean mapping the "cpu needs deferred
flush" variable into the cpu_entry_area, which doesn't seem too awful.

I think the basic overall concern is that the deferred flush leaves too
many holes and by the time we close them sufficiently, performance will
suck again.  Seems like a totally valid concern, but my crystal ball is
hazy on whether it will be worth it in the end to many folks

...
> In other words, I think that ret2dir is an insufficient justification
> for XPFO.

Yeah, other things that it is good for have kinda been lost in the
noise.  I think I first started looking at this long before Meltdown and
L1TF were public.

There are hypervisors out there that simply don't (persistently) map
user data.  They can't leak user data because they don't even have
access to it in their virtual address space.  Those hypervisors had a
much easier time with L1TF mitigation than we did.  Basically, they
could flush the L1 after user data was accessible instead of before
untrusted guest code runs.

My hope is that XPFO could provide us similar protection.  But,
somebody's got to poke at it for a while to see how far they can push it.

IMNHO, XPFO is *always* going to be painful for kernel compiles.  But,
cloud providers aren't doing a lot of kernel compiles on their KVM
hosts, and they deeply care about leaking their users' data.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)

2019-04-05 Thread Dave Hansen
On 4/5/19 12:17 AM, Thomas Gleixner wrote:
>> process. Is that an acceptable trade-off?
> You are not seriously asking whether creating a user controllable ret2dir
> attack window is a acceptable trade-off? April 1st was a few days ago.

Well, let's not forget that this set at least takes us from "always
vulnerable to ret2dir" to a choice between:

1. fast-ish and "vulnerable to ret2dir for a user-controllable window"
2. slow and "mitigated against ret2dir"

Sounds like we need a mechanism that will do the deferred XPFO TLB
flushes whenever the kernel is entered, and not _just_ at context switch
time.  This permits an app to run in userspace with stale kernel TLB
entries as long as it wants... that's harmless.

But, if it enters the kernel, we could process the deferred flush there.
 The concern would be catching all the kernel entry paths (PTI does this
today obviously, but I don't think we want to muck with the PTI code for
this).  The other concern would be that the code between kernel entry
and the flush would be vulnerable.  But, that seems like a reasonable
trade-off to me.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm: Add kernel MMU notifier to manage IOTLB/DEVTLB

2017-12-13 Thread Dave Hansen
On 12/13/2017 07:38 PM, Lu Baolu wrote:
> 2. When vmalloc/vfree interfaces are called, the page mappings
> for kernel memory might get changed. And current code calls
> flush_tlb_kernel_range() to flush CPU TLBs only. The IOTLB or
> DevTLB will be stale compared to that on the cpu for kernel
> mappings.

What's the plan to deal with all of the ways other than vmalloc() that
the kernel changes the page tables?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-04-27 Thread Dave Hansen
On 04/27/2017 12:25 AM, Dave Young wrote:
> On 04/21/17 at 02:55pm, Dave Hansen wrote:
>> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>> determine if SME is active.
>>>
>>> A new directory will be created:
>>>   /sys/kernel/mm/sme/
>>>
>>> And two entries within the new directory:
>>>   /sys/kernel/mm/sme/active
>>>   /sys/kernel/mm/sme/encryption_mask
>>
>> Why do they care, and what will they be doing with this information?
> 
> Since kdump will copy old memory but need this to know if the old memory
> was encrypted or not. With this sysfs file we can know the previous SME
> status and pass to kdump kernel as like a kernel param.
> 
> Tom, have you got chance to try if it works or not?

What will the kdump kernel do with it though?  We kexec() into that
kernel so the SME keys will all be the same, right?  So, will the kdump
kernel be just setting the encryption bit in the PTE so it can copy the
old plaintext out?

Why do we need both 'active' and 'encryption_mask'?  How could it be
that the hardware-enumerated 'encryption_mask' changes across a kexec()?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption

2017-04-24 Thread Dave Hansen
On 04/24/2017 08:53 AM, Tom Lendacky wrote:
> On 4/21/2017 4:52 PM, Dave Hansen wrote:
>> On 04/18/2017 02:17 PM, Tom Lendacky wrote:
>>> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void
>>> *from, unsigned long vaddr,
>>>  __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>>>
>>>  #ifndef __va
>>> -#define __va(x)((void *)((unsigned long)(x)+PAGE_OFFSET))
>>> +#define __va(x)((void *)(__sme_clr(x) + PAGE_OFFSET))
>>>  #endif
>>
>> It seems wrong to be modifying __va().  It currently takes a physical
>> address, and this modifies it to take a physical address plus the SME
>> bits.
> 
> This actually modifies it to be sure the encryption bit is not part of
> the physical address.

If SME bits make it this far, we have a bug elsewhere.  Right?  Probably
best not to paper over it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-04-21 Thread Dave Hansen
On 04/18/2017 02:22 PM, Tom Lendacky wrote:
> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> determine if SME is active.
> 
> A new directory will be created:
>   /sys/kernel/mm/sme/
> 
> And two entries within the new directory:
>   /sys/kernel/mm/sme/active
>   /sys/kernel/mm/sme/encryption_mask

Why do they care, and what will they be doing with this information?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption

2017-04-21 Thread Dave Hansen
On 04/18/2017 02:17 PM, Tom Lendacky wrote:
> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, 
> unsigned long vaddr,
>   __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>  
>  #ifndef __va
> -#define __va(x)  ((void *)((unsigned 
> long)(x)+PAGE_OFFSET))
> +#define __va(x)  ((void *)(__sme_clr(x) + PAGE_OFFSET))
>  #endif

It seems wrong to be modifying __va().  It currently takes a physical
address, and this modifies it to take a physical address plus the SME bits.

How does that end up ever happening?  If we are pulling physical
addresses out of the page tables, we use p??_phys().  I'd expect *those*
to be masking off the SME bits.

Is it these cases?

pgd_t *base = __va(read_cr3());

For those, it seems like we really want to create two modes of reading
cr3.  One that truly reads CR3 and another that reads the pgd's physical
address out of CR3.  Then you only do the SME masking on the one
fetching a physical address, and the SME bits never leak into __va().
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption

2017-02-22 Thread Dave Hansen
On 02/16/2017 07:43 AM, Tom Lendacky wrote:
> )
> @@ -673,7 +683,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
>   * Currently stuck as a macro due to indirect forward reference to
>   * linux/mmzone.h's __section_mem_map_addr() definition:
>   */
> -#define pgd_page(pgd)pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
> +#define pgd_page(pgd)pfn_to_page(pgd_pfn(pgd))

FWIW, these seem like good cleanups that can go in separately from the
rest of your series.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption

2017-02-22 Thread Dave Hansen
On 02/16/2017 07:43 AM, Tom Lendacky wrote:
>  static inline unsigned long pte_pfn(pte_t pte)
>  {
> - return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
> + return (pte_val(pte) & ~sme_me_mask & PTE_PFN_MASK) >> PAGE_SHIFT;
>  }
>  
>  static inline unsigned long pmd_pfn(pmd_t pmd)
>  {
> - return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
> + return (pmd_val(pmd) & ~sme_me_mask & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
>  }

Could you talk a bit about why you chose to do the "~sme_me_mask" bit in
here instead of making it a part of PTE_PFN_MASK / pmd_pfn_mask(pmd)?

It might not matter, but I'd be worried that this ends up breaking
direct users of PTE_PFN_MASK / pmd_pfn_mask(pmd) since they now no
longer mask the PFN out of a PTE.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu