Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-14 Thread Xingang Wang

Hi Shameer,

On 2021/4/14 14:56, Shameerali Kolothum Thodi wrote:




-Original Message-
From: wangxingang
Sent: 14 April 2021 03:36
To: Eric Auger ; eric.auger@gmail.com;
jean-phili...@linaro.org; iommu@lists.linux-foundation.org;
linux-ker...@vger.kernel.org; k...@vger.kernel.org;
kvm...@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org;
robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com;
t...@semihalf.com; zhukeqian 
Cc: jacob.jun@linux.intel.com; yi.l@intel.com; zhangfei@linaro.org;
zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
Thodi ; yuzenghui
; nicoleots...@gmail.com; lushenming
; vse...@nvidia.com; chenxiang (M)
; vdu...@nvidia.com; jiangkunkun

Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)

Hi Eric, Jean-Philippe

On 2021/4/11 19:12, Eric Auger wrote:

SMMUv3 Nested Stage Setup (IOMMU part)

This series brings the IOMMU part of HW nested paging support
in the SMMUv3. The VFIO part is submitted separately.

This is based on Jean-Philippe's
[PATCH v14 00/10] iommu: I/O page faults for SMMUv3
https://www.spinics.net/lists/arm-kernel/msg886518.html
(including the patches that were not pulled for 5.13)

The IOMMU API is extended to support 2 new API functionalities:
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings

Then those capabilities gets implemented in the SMMUv3 driver.

The virtualizer passes information through the VFIO user API
which cascades them to the iommu subsystem. This allows the guest
to own stage 1 tables and context descriptors (so-called PASID
table) while the host owns stage 2 tables and main configuration
structures (STE).

Best Regards

Eric

This series can be found at:
v5.12-rc6-jean-iopf-14-2stage-v15
(including the VFIO part in its last version: v13)



I am testing the performance of an accelerator with/without SVA/vSVA,
and found there might be some potential performance loss risk for SVA/vSVA.

I use a Network and computing encryption device (SEC), and send 1MB
request for 1 times.

I trigger mm fault before I send the request, so there should be no iopf.

Here's what I got:

physical scenario:
performance:SVA:9MB/s   NOSVA:9MB/s
tlb_miss:   SVA:302,651 NOSVA:1,223
trans_table_walk_access:SVA:302,276 NOSVA:1,237

VM scenario:
performance:vSVA:9MB/s  NOvSVA:6MB/s  about 30~40% loss
tlb_miss:   vSVA:4,423,897  NOvSVA:1,907
trans_table_walk_access:vSVA:61,928,430 NOvSVA:21,948

In physical scenario, there's almost no performance loss, but the
tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high,
comparing to NOSVA.

In VM scenario, there's about 30~40% performance loss, this is because
the two stage tlb_miss and trans_table_walk_access is even higher, and
impact the performance.

I compare the procedure of building page table of SVA and NOSVA, and
found that NOSVA uses 2MB mapping as far as possible, while SVA uses
only 4KB.

I retest with huge page, and huge page could solve this problem, the
performance of SVA/vSVA is almost the same as NOSVA.

I am wondering do you have any other solution for the performance loss
of vSVA, or any other method to reduce the tlb_miss/trans_table_walk.


Hi Xingang,

Just curious, do you have DVM enabled on this board or does it use explicit
SMMU TLB invalidations?

Thanks,
Shameer



For now, DVM is enabled and TLBI is not explicit used.

And by the way the performance data above is
performance:vSVA:9GB/s(not 9MB/s)  NOvSVA:6GB/s(not 6GB/s)

Thanks

Xingang

.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-14 Thread Lu Baolu

Hi Keqian,

On 4/13/21 4:54 PM, Keqian Zhu wrote:

Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Three new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

A new dev feature are added to indicate whether a specific type of
iommu hardware supports and its driver realizes them.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
  drivers/iommu/iommu.c | 150 ++
  include/linux/iommu.h |  53 +++
  2 files changed, 203 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..667b2d6d2fc0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1922,6 +1922,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
domain->type = type;
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+   mutex_init(&domain->switch_log_lock);
  
  	return domain;

  }
@@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
  }
  EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
  
+int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,

+  unsigned long iova, size_t size, int prot)
+{
+   const struct iommu_ops *ops = domain->ops;
+   int ret;
+
+   if (unlikely(!ops || !ops->switch_dirty_log))
+   return -ENODEV;
+
+   mutex_lock(&domain->switch_log_lock);
+   if (enable && domain->dirty_log_tracking) {
+   ret = -EBUSY;
+   goto out;
+   } else if (!enable && !domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = ops->switch_dirty_log(domain, enable, iova, size, prot);
+   if (ret)
+   goto out;
+
+   domain->dirty_log_tracking = enable;
+out:
+   mutex_unlock(&domain->switch_log_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);


Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the
difference between

iommu_switch_dirty_log(on) vs. 
iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)


iommu_switch_dirty_log(off) vs. 
iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM)



+
+int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
+size_t size, unsigned long *bitmap,
+unsigned long base_iova, unsigned long bitmap_pgshift)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned int min_pagesz;
+   size_t pgsize;
+   int ret = 0;
+
+   if (unlikely(!ops || !ops->sync_dirty_log))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   mutex_lock(&domain->switch_log_lock);
+   if (!domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   while (size) {
+   pgsize = iommu_pgsize(domain, iova, size);
+
+   ret = ops->sync_dirty_log(domain, iova, pgsize,
+ bitmap, base_iova, bitmap_pgshift);


Any reason why do you want to do this in a per-4K page manner? This can
lead to a lot of indirect calls and bad performance.

How about a sync_dirty_pages()?

The same comment applies to other places in this patch series.


+   if (ret)
+   break;
+
+   pr_debug("dirty_log_sync handle: iova 0x%lx pagesz 0x%zx\n",
+iova, pgsize);
+
+   iova += pgsize;
+   size -= pgsize;
+   }
+out:
+   mutex_unlock(&domain->switch_log_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sync_dirty_log);
+
+static int __iommu_clear_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize;
+   int ret = 0;
+
+   if (unlikely(!ops || !ops->clear_dirty_log))
+   return -ENODEV;
+
+   while (size) {
+   pgsize = iommu_pgsize(domain, iova, size);
+
+   ret = ops->clear_dirty_log(domain, iova, pgsize, 

Re: [PATCH v3 02/12] iommu: Add iommu_split_block interface

2021-04-14 Thread Lu Baolu

On 4/13/21 4:54 PM, Keqian Zhu wrote:

Block(largepage) mapping is not a proper granule for dirty log tracking.
Take an extreme example, if DMA writes one byte, under 1G mapping, the
dirty amount reported is 1G, but under 4K mapping, the dirty amount is
just 4K.

This adds a new interface named iommu_split_block in IOMMU base layer.
A specific IOMMU driver can invoke it during start dirty log. If so, the
driver also need to realize the split_block iommu ops.

We flush all iotlbs after the whole procedure is completed to ease the
pressure of IOMMU, as we will hanle a huge range of mapping in general.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
  drivers/iommu/iommu.c | 41 +
  include/linux/iommu.h | 11 +++
  2 files changed, 52 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 667b2d6d2fc0..bb413a927870 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2721,6 +2721,47 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
  }
  EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
  
+int iommu_split_block(struct iommu_domain *domain, unsigned long iova,

+ size_t size)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned int min_pagesz;
+   size_t pgsize;
+   bool flush = false;
+   int ret = 0;
+
+   if (unlikely(!ops || !ops->split_block))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   while (size) {
+   flush = true;
+
+   pgsize = iommu_pgsize(domain, iova, size);
+
+   ret = ops->split_block(domain, iova, pgsize);
+   if (ret)
+   break;
+
+   pr_debug("split handled: iova 0x%lx size 0x%zx\n", iova, 
pgsize);
+
+   iova += pgsize;
+   size -= pgsize;
+   }
+
+   if (flush)
+   iommu_flush_iotlb_all(domain);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_split_block);


Do you really have any consumers of this interface other than the dirty
bit tracking? If not, I don't suggest to make this as a generic IOMMU
interface.

There is an implicit requirement for such interfaces. The
iommu_map/unmap(iova, size) shouldn't be called at the same time.
Currently there's no such sanity check in the iommu core. A poorly
written driver could mess up the kernel by misusing this interface.

This also applies to iommu_merge_page().

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] x86/dma: Tear down DMA ops on driver unbind

2021-04-14 Thread Jean-Philippe Brucker
Since commit 08a27c1c3ecf ("iommu: Add support to change default domain
of an iommu group") a user can switch a device between IOMMU and direct
DMA through sysfs. This doesn't work for AMD IOMMU at the moment because
dev->dma_ops is not cleared when switching from a DMA to an identity
IOMMU domain. The DMA layer thus attempts to use the dma-iommu ops on an
identity domain, causing an oops:

  # echo :00:05.0 > /sys/sys/bus/pci/drivers/e1000e/unbind
  # echo identity > /sys/bus/pci/devices/:00:05.0/iommu_group/type
  # echo :00:05.0 > /sys/sys/bus/pci/drivers/e1000e/bind
   ...
  [  190.017587] BUG: kernel NULL pointer dereference, address: 0028
   ...
  [  190.027375] Call Trace:
  [  190.027561]  iommu_dma_alloc+0xd0/0x100
  [  190.027840]  e1000e_setup_tx_resources+0x56/0x90
  [  190.028173]  e1000e_open+0x75/0x5b0

Implement arch_teardown_dma_ops() on x86 to clear the device's dma_ops
pointer during driver unbind.

Fixes: 08a27c1c3ecf ("iommu: Add support to change default domain of an iommu 
group")
Signed-off-by: Jean-Philippe Brucker 
---
 arch/x86/Kconfig  | 1 +
 arch/x86/kernel/pci-dma.c | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..2c90f8de3e20 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -85,6 +85,7 @@ config X86
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
select ARCH_HAS_SYSCALL_WRAPPER
+   select ARCH_HAS_TEARDOWN_DMA_OPSif IOMMU_DMA
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAS_DEBUG_WX
select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de234e7a8962..60a4ec22d849 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -154,3 +154,10 @@ static void via_no_dac(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
PCI_CLASS_BRIDGE_PCI, 8, via_no_dac);
 #endif
+
+#ifdef CONFIG_ARCH_HAS_TEARDOWN_DMA_OPS
+void arch_teardown_dma_ops(struct device *dev)
+{
+   set_dma_ops(dev, NULL);
+}
+#endif
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-14 Thread Jason Gunthorpe
On Wed, Apr 14, 2021 at 02:22:09PM +0800, Lu Baolu wrote:

> I still worry about supervisor pasid allocation.
> 
> If we use iommu_sva_alloc_pasid() to allocate a supervisor pasid, which
> mm should the pasid be set? I've ever thought about passing &init_mm to
> iommu_sva_alloc_pasid(). But if you add "mm != current->mm", this seems
> not to work. Or do you prefer a separated interface for supervisor pasid
> allocation/free?

Without a mm_struct it is not SVA, so don't use SVA APIs for whatever
a 'supervisor pasid' is

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-04-14 Thread Robin Murphy

On 2021-04-14 07:38, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change.


This seems not entirely unreasonable, but private_free_iova() really 
needs to be renamed (maybe something like remove_iova()?) if it's no 
longer actually freeing anything - otherwise it's just unnecessarily 
misleading.


Robin.


Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index c669526f..292ed4a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -339,7 +339,6 @@ static void private_free_iova(struct iova_domain *iovad, 
struct iova *iova)
assert_spin_locked(&iovad->iova_rbtree_lock);
__cached_rbnode_delete_update(iovad, iova);
rb_erase(&iova->node, &iovad->rbroot);
-   free_iova_mem(iova);
  }
  
  /**

@@ -376,6 +375,7 @@ __free_iova(struct iova_domain *iovad, struct iova *iova)
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
private_free_iova(iovad, iova);
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+   free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(__free_iova);
  
@@ -397,7 +397,7 @@ free_iova(struct iova_domain *iovad, unsigned long pfn)

if (iova)
private_free_iova(iovad, iova);
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
-
+   free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(free_iova);
  
@@ -746,6 +746,7 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)

continue;
  
  		private_free_iova(iovad, iova);

+   free_iova_mem(iova);
}
  
  	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC V2 PATCH 9/12] swiotlb: Add bounce buffer remap address setting function

2021-04-14 Thread Tianyu Lan

On 4/14/2021 2:43 PM, Christoph Hellwig wrote:

On Tue, Apr 13, 2021 at 11:22:14AM -0400, Tianyu Lan wrote:

From: Tianyu Lan 

For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory)
needs to be accessed via extra address space(e.g address above bit39).
Hyper-V code may remap extra address space outside of swiotlb. swiotlb_bounce()
needs to use remap virtual address to copy data from/to bounce buffer. Add
new interface swiotlb_set_bounce_remap() to do that.


I have no way to review what this actually doing when you only Cc me
on a single patch.  Please make sure everyone is Cced on the whole
series to enable proper review.



Sure. I will resend all patches. Thanks for reminder.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Resend RFC PATCH V2 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

"Resend all patches because someone in CC list didn't receive all
patchset. Sorry for nosy."

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Tianyu Lan (12):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in Isolation VM
  x86/Hyper-V: Add new hvcall guest address host visibility support
  HV: Add Write/Read MSR registers via ghcb
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  UIO/Hyper-V: Not load UIO HV driver in the isolation VM.
  swiotlb: Add bounce buffer remap address setting function
  HV/IOMMU: Add Hyper-V dma ops support
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |  70 +--
 arch/x86/hyperv/ivm.c  | 289 +
 arch/x86/include/asm/hyperv-tlfs.h |  22 +++
 arch/x86/include/asm/mshyperv.h|  90 +++--
 arch/x86/kernel/cpu/mshyperv.c |   5 +
 arch/x86/kernel/pci-swiotlb.c  |   3 +-
 drivers/hv/channel.c   |  44 -
 drivers/hv/connection.c|  68 ++-
 drivers/hv/hv.c|  73 ++--
 drivers/hv/hyperv_vmbus.h  |   3 +
 drivers/hv/ring_buffer.c   |  83 ++---
 drivers/hv/vmbus_drv.c |   3 +
 drivers/iommu/hyperv-iommu.c   | 127 +
 drivers/net/hyperv/hyperv_net.h|  11 ++
 drivers/net/hyperv/netvsc.c| 137 +-
 drivers/net/hyperv/rndis_filter.c  |   3 +
 drivers/scsi/storvsc_drv.c |  67 ++-
 drivers/uio/uio_hv_generic.c   |   5 +
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |  18 +-
 include/linux/hyperv.h |  12 +-
 include/linux/swiotlb.h|   5 +
 kernel/dma/swiotlb.c   |  13 +-
 mm/ioremap.c   |   1 +
 mm/vmalloc.c   |   1 +
 26 files changed, 1068 insertions(+), 88 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Resend RFC PATCH V2 02/12] x86/HV: Initialize shared memory boundary in Isolation VM

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes shared memory boundary via cpuid HYPERV_
CPUID_ISOLATION_CONFIG and store it in the shared_gpa_
boundary of ms_hyperv struct. This prepares to share
memory with host for AMD SEV SNP guest.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 13 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e88bc296afca..aeafd4017c89 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -328,6 +328,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.features_b & HV_ISOLATION) {
ms_hyperv.isolation_config_a = 
cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = 
cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+   ms_hyperv.shared_gpa_boundary =
+   (u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c6f4c5c20fb8..b73e201abc70 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -34,8 +34,19 @@ struct ms_hyperv_info {
u32 max_vp_index;
u32 max_lp_index;
u32 isolation_config_a;
-   u32 isolation_config_b;
+   union
+   {
+   u32 isolation_config_b;
+   struct {
+   u32 cvm_type : 4;
+   u32 Reserved11 : 1;
+   u32 shared_gpa_boundary_active : 1;
+   u32 shared_gpa_boundary_bits : 6;
+   u32 Reserved12 : 20;
+   };
+   };
void  __percpu **ghcb_base;
+   u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Resend RFC PATCH V2 01/12] x86/HV: Initialize GHCB page in Isolation VM

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c  | 52 +++---
 include/asm-generic/mshyperv.h |  1 +
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0db5137d5b81..90e65fbf4c58 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -82,6 +82,9 @@ static int hv_cpu_init(unsigned int cpu)
struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
void **input_arg;
struct page *pg;
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
 
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 
hv_root_partition ? 1 : 0);
@@ -128,6 +131,17 @@ static int hv_cpu_init(unsigned int cpu)
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
}
 
+   if (ms_hyperv.ghcb_base) {
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+
+   ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+   if (!ghcb_va)
+   return -ENOMEM;
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
+   }
+
return 0;
 }
 
@@ -223,6 +237,7 @@ static int hv_cpu_die(unsigned int cpu)
unsigned long flags;
void **input_arg;
void *pg;
+   void **ghcb_va = NULL;
 
local_irq_save(flags);
input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -236,6 +251,13 @@ static int hv_cpu_die(unsigned int cpu)
*output_arg = NULL;
}
 
+   if (ms_hyperv.ghcb_base) {
+   ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   if (*ghcb_va)
+   iounmap(*ghcb_va);
+   *ghcb_va = NULL;
+   }
+
local_irq_restore(flags);
 
free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
@@ -372,6 +394,9 @@ void __init hyperv_init(void)
u64 guest_id, required_msrs;
union hv_x64_msr_hypercall_contents hypercall_msr;
int cpuhp, i;
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
 
if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return;
@@ -432,9 +457,24 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
-   if (hv_hypercall_pg == NULL) {
-   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-   goto remove_cpuhp_state;
+   if (hv_hypercall_pg == NULL)
+   goto clean_guest_os_id;
+
+   if (hv_isolation_type_snp()) {
+   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   if (!ms_hyperv.ghcb_base)
+   goto clean_guest_os_id;
+
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+   if (!ghcb_va) {
+   free_percpu(ms_hyperv.ghcb_base);
+   ms_hyperv.ghcb_base = NULL;
+   goto clean_guest_os_id;
+   }
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -499,7 +539,8 @@ void __init hyperv_init(void)
 
return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -528,6 +569,9 @@ void hyperv_cleanup(void)
 */
hv_hypercall_pg = NULL;
 
+   if (ms_hyperv.ghcb_base)
+   free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index dff58a3db5d5..c6f4c5c20fb8 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,6 +35,7 @@ struct ms_hyperv_info {
u32 max_lp_index;
u32 isolation_config_a;
u32 isolation_config_b;
+   void  __percpu **ghcb_base;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Resend RFC PATCH V2 03/12] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Add new hvcall guest address host visibility support. Mark vmbus
ring buffer visible to host when create gpadl buffer and mark back
to not visible when tear down gpadl buffer.

Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/Makefile   |  2 +-
 arch/x86/hyperv/ivm.c  | 90 ++
 arch/x86/include/asm/hyperv-tlfs.h | 22 
 arch/x86/include/asm/mshyperv.h|  2 +
 drivers/hv/channel.c   | 34 ++-
 include/asm-generic/hyperv-tlfs.h  |  1 +
 include/linux/hyperv.h | 12 +++-
 7 files changed, 159 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..a5950b7a9214
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * hv_set_mem_host_visibility - Set host visibility for specified memory.
+ */
+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
+{
+   int i, pfn;
+   int pagecount = size >> HV_HYP_PAGE_SHIFT;
+   u64 *pfn_array;
+   int ret = 0;
+
+   pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
+   if (!pfn_array)
+   return -ENOMEM;
+
+   for (i = 0, pfn = 0; i < pagecount; i++) {
+   pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+   pfn++;
+
+   if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+   ret = hv_mark_gpa_visibility(pfn, pfn_array, 
visibility);
+   pfn = 0;
+
+   if (ret)
+   goto err_free_pfn_array;
+   }
+   }
+
+ err_free_pfn_array:
+   vfree(pfn_array);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(hv_set_mem_host_visibility);
+
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+   struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
+   struct hv_input_modify_sparse_gpa_page_host_visibility *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -1;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, &pages_processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index e6cd3fee562b..1f1ce9afb6f1 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -236,6 +236,15 @@ enum hv_isolation_type {
 /* TSC invariant control */
 #define HV_X64_MSR_TSC_INVARIANT_CONTROL   0x4118
 
+/* Hyper-V GPA map flags */
+#define HV_MAP_GPA_PERMISSIONS_NONE0x0
+#define HV_MAP_GPA_READABLE0x1
+#define HV_MAP_GPA_WRITABLE0x2
+
+#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
+#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)
+#define VMBUS_PAGE_NOT_VISIBLE HV_MAP_GPA_PERMISSIONS_NONE
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
@@ -564,4 +573,17 @@ enum hv_interrupt_type {
 
 #include 
 
+/* All input parameters should be 

[Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers and these registers are emulated by
Hypervisor rather than paravisor.

Hyper-V requests to write SINTx MSR registers twice(once via
GHCB and once via wrmsr instruction including the proxy bit 21)
Guest OS ID MSR also needs to be set via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   |  18 +
 arch/x86/hyperv/ivm.c   | 130 
 arch/x86/include/asm/mshyperv.h |  87 +
 arch/x86/kernel/cpu/mshyperv.c  |   3 +
 drivers/hv/hv.c |  65 +++-
 include/asm-generic/mshyperv.h  |   4 +-
 6 files changed, 261 insertions(+), 46 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 90e65fbf4c58..87b1dd9c84d6 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -475,6 +475,9 @@ void __init hyperv_init(void)
 
ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
*ghcb_base = ghcb_va;
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -561,6 +564,7 @@ void hyperv_cleanup(void)
 
/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
/*
 * Reset hypercall page reference before reset the page,
@@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
 }
 EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.features_b & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index a5950b7a9214..2ec64b367aaf 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,12 +6,139 @@
  *  Tianyu Lan 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
+union hv_ghcb {
+   struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);
+
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
+   ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+   ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
+   ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 1);
+   ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
+
+   VMGEXIT();
+
+   if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
+   pr_warn("Fail to write msr via ghcb.\n.");
+
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
+   ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+   ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 0);
+   ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
+
+   VMGEXIT();
+
+   if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
+   pr_warn("Fail to write msr via ghcb.\n.");
+   else
+   *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+   | ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+   hv_ghcb_msr_read(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
+
+void hv_sint_

[Resend RFC PATCH V2 06/12] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

The physical address of monitor pages in the CHANNELMSG_INITIATE_CONTACT
msg should be in the extra address space for SNP support and these
pages also should be accessed via the extra address space inside Linux
guest and remap the extra address by ioremap function.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/connection.c   | 62 +++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 63 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 79bca653dce9..a0be9c11d737 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -101,6 +101,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
 
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+   if (hv_isolation_type_snp()) {
+   msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+   msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+   }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
/*
@@ -145,6 +151,29 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
return -ECONNREFUSED;
}
 
+   if (hv_isolation_type_snp()) {
+   vmbus_connection.monitor_pages_va[0]
+   = vmbus_connection.monitor_pages[0];
+   vmbus_connection.monitor_pages[0]
+   = ioremap_cache(msg->monitor_page1, HV_HYP_PAGE_SIZE);
+   if (!vmbus_connection.monitor_pages[0])
+   return -ENOMEM;
+
+   vmbus_connection.monitor_pages_va[1]
+   = vmbus_connection.monitor_pages[1];
+   vmbus_connection.monitor_pages[1]
+   = ioremap_cache(msg->monitor_page2, HV_HYP_PAGE_SIZE);
+   if (!vmbus_connection.monitor_pages[1]) {
+   vunmap(vmbus_connection.monitor_pages[0]);
+   return -ENOMEM;
+   }
+
+   memset(vmbus_connection.monitor_pages[0], 0x00,
+  HV_HYP_PAGE_SIZE);
+   memset(vmbus_connection.monitor_pages[1], 0x00,
+  HV_HYP_PAGE_SIZE);
+   }
+
return ret;
 }
 
@@ -156,6 +185,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+   u64 pfn[2];
 
/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -213,6 +243,16 @@ int vmbus_connect(void)
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   if (hv_mark_gpa_visibility(2, pfn,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+   ret = -EFAULT;
+   goto cleanup;
+   }
+   }
+
msginfo = kzalloc(sizeof(*msginfo) +
  sizeof(struct vmbus_channel_initiate_contact),
  GFP_KERNEL);
@@ -279,6 +319,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+   u64 pfn[2];
+
/*
 * First send the unload request to the host.
 */
@@ -298,6 +340,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
 
+   if (hv_isolation_type_snp()) {
+   if (vmbus_connection.monitor_pages_va[0]) {
+   vunmap(vmbus_connection.monitor_pages[0]);
+   vmbus_connection.monitor_pages[0]
+   = vmbus_connection.monitor_pages_va[0];
+   vmbus_connection.monitor_pages_va[0] = NULL;
+   }
+
+   if (vmbus_connection.monitor_pages_va[1]) {
+   vunmap(vmbus_connection.monitor_pages[1]);
+   vmbus_connection.monitor_pages[1]
+   = vmbus_connection.monitor_pages_va[1];
+   vmbus_connection.monitor_pages_va[1] = NULL;
+   }
+
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+   }
+
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
vmbus_connection.monitor_pages[0] = NULL;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9416e09ebd58..0778add21a9c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -240,6 +240,7 @@ struct vmbus_connection {
 * is child->parent noti

[Resend RFC PATCH V2 07/12] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c  | 10 +
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 83 +--
 mm/ioremap.c  |  1 +
 mm/vmalloc.c  |  1 +
 5 files changed, 76 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 407b74d72f3f..4a9fb7ad4c72 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -634,6 +634,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;
 
+   err = hv_ringbuffer_post_init(&newchannel->outbound,
+ page, send_pages);
+   if (err)
+   goto error_free_gpadl;
+
+   err = hv_ringbuffer_post_init(&newchannel->inbound,
+ &page[send_pages], recv_pages);
+   if (err)
+   goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 0778add21a9c..d78a04ad5490 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+   struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 pagecnt);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 35833d4d1a1d..c8b0f7b45158 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(&channel->outbound.ring_buffer_mutex);
 }
 
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 page_cnt)
+{
+   struct vm_struct *area;
+   u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+   unsigned long vaddr;
+   int err = 0;
+
+   if (!hv_isolation_type_snp())
+   return 0;
+
+   physic_addr += ms_hyperv.shared_gpa_boundary;
+   area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP);
+   if (!area || !area->addr)
+   return -EFAULT;
+
+   vaddr = (unsigned long)area->addr;
+   err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE,
+  physic_addr, PAGE_KERNEL_IO);
+   err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE,
+ vaddr + (2 * page_cnt - 1) * PAGE_SIZE,
+ physic_addr + PAGE_SIZE, PAGE_KERNEL_IO);
+   if (err) {
+   vunmap((void *)vaddr);
+   return -EFAULT;
+   }
+
+   /* Clean memory after setting host visibility. */
+   memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+   ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+   ring_info->ring_buffer->read_index = 0;
+   ring_info->ring_buffer->write_index = 0;
+   ring_info->ring_buffer->feature_bits.value = 1;
+
+   return 0;
+}
+
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 page_cnt)
@@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
*ring_info,
 
BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
 
-   /*
-* First page holds struct hv_ring_buffer, do wraparound mapping for
-* the rest.
-*/
-   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
-  GFP_KERNEL);
-   if (!pages_wraparound)
-   return -ENOMEM;
-
-   pages_wraparound[0] = pages;
-   for (i = 0; i < 2 * (page_cnt - 1); i++)
-   pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
+   if (!hv_isolation_type_snp()) {
+   /*
+* First page holds struct hv_ring_buffer, do wraparound 
mapping for
+* the rest.
+*/
+   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page 
*),
+  GFP_KERNEL);
+   if (!pages_wraparound)
+

[Resend RFC PATCH V2 05/12] HV: Add ghcb hvcall support for SNP VM

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 69 +
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c |  6 ++-
 drivers/hv/hv.c |  8 +++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 2ec64b367aaf..0ad73ea60c8f 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -18,8 +18,77 @@
 
 union hv_ghcb {
struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reserved2   : 4;
+   u32 repstartindex   : 12;
+   u32 reserved3   : 4;
+   };
+   u64 asuint64;
+   } hypercallinput;
+   union {
+   struct {
+   u16 callstatus;
+   u16 reserved1;
+   u32 elementsprocessed : 12;
+   u32 reserved2 : 20;
+   };
+   u64 asunit64;
+   } hypercalloutput;
+   };
+   u64 reserved2;
+   } hypercall;
 } __packed __aligned(PAGE_SIZE);
 
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 1;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 73501dbbc240..929504fe8654 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -318,6 +318,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index c83612cddb99..79bca653dce9 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -442,6 +442,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
++channel->sig_events;
 
-   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 069530eeb7c6..bff7c9049ffb 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -60,7 +60,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-   status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+   if (hv_isolation_type_

[Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

UIO HV driver should not load in the isolation VM for security reason.
Return ENOTSUPP in the hv_uio_probe() in the isolation VM.

Signed-off-by: Tianyu Lan 
---
 drivers/uio/uio_hv_generic.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 0330ba99730e..678b021d66f8 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../hv/hyperv_vmbus.h"
 
@@ -241,6 +242,10 @@ hv_uio_probe(struct hv_device *dev,
void *ring_buffer;
int ret;
 
+   /* UIO driver should not be loaded in the isolation VM.*/
+   if (hv_is_isolation_supported())
+   return -ENOTSUPP;
+   
/* Communicating with host has to be via shared memory not hypercall */
if (!channel->offermsg.monitor_allocated) {
dev_err(&dev->device, "vmbus channel requires hypercall\n");
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Resend RFC PATCH V2 09/12] swiotlb: Add bounce buffer remap address setting function

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory)
needs to be accessed via extra address space(e.g address above bit39).
Hyper-V code may remap extra address space outside of swiotlb. swiotlb_bounce()
needs to use remap virtual address to copy data from/to bounce buffer. Add
new interface swiotlb_set_bounce_remap() to do that.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  5 +
 kernel/dma/swiotlb.c| 13 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d9c9fc9ca5d2..3ccd08116683 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -82,8 +82,13 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long new_size);
+void swiotlb_set_bounce_remap(unsigned char *vaddr);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
+static inline void swiotlb_set_bounce_remap(unsigned char *vaddr)
+{
+}
+
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
return false;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..5fd2db6aa149 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -94,6 +94,7 @@ static unsigned int io_tlb_index;
  * not be bounced (unless SWIOTLB_FORCE is set).
  */
 static unsigned int max_segment;
+static unsigned char *swiotlb_bounce_remap_addr;
 
 /*
  * We need to save away the original address corresponding to a mapped entry
@@ -421,6 +422,11 @@ void __init swiotlb_exit(void)
swiotlb_cleanup();
 }
 
+void swiotlb_set_bounce_remap(unsigned char *vaddr)
+{
+   swiotlb_bounce_remap_addr = vaddr;
+}
+
 /*
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
@@ -428,7 +434,12 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
   size_t size, enum dma_data_direction dir)
 {
unsigned long pfn = PFN_DOWN(orig_addr);
-   unsigned char *vaddr = phys_to_virt(tlb_addr);
+   unsigned char *vaddr;
+
+   if (swiotlb_bounce_remap_addr)
+   vaddr = swiotlb_bounce_remap_addr + tlb_addr - io_tlb_start;
+   else
+   vaddr = phys_to_virt(tlb_addr);
 
if (PageHighMem(pfn_to_page(pfn))) {
/* The buffer does not have a mapping.  Map it in and copy */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Resend RFC PATCH V2 10/12] HV/IOMMU: Add Hyper-V dma ops support

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM requires bounce buffer support. To use swiotlb
bounce buffer, add Hyper-V dma ops and use swiotlb functions in the
map and unmap callback.

Allocate bounce buffer in the Hyper-V code because bounce buffer
needs to be accessed via extra address space(e.g, address above 39bit)
in the AMD SEV SNP based Isolation VM.

ioremap_cache() can't use in the hyperv_iommu_swiotlb_init() which
is too early place and remap bounce buffer in the hyperv_iommu_swiotlb_
later_init().

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/pci-swiotlb.c |   3 +-
 drivers/hv/vmbus_drv.c|   3 +
 drivers/iommu/hyperv-iommu.c  | 127 ++
 3 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index c2cfa5e7c152..caaf68c06f24 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -15,6 +15,7 @@
 #include 
 
 int swiotlb __read_mostly;
+extern int hyperv_swiotlb;
 
 /*
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
@@ -68,7 +69,7 @@ void __init pci_swiotlb_init(void)
 void __init pci_swiotlb_late_init(void)
 {
/* An IOMMU turned us off. */
-   if (!swiotlb)
+   if (!swiotlb && !hyperv_swiotlb)
swiotlb_exit();
else {
printk(KERN_INFO "PCI-DMA: "
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 10dce9f91216..0ee6ec3a5de6 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -2030,6 +2031,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2070,6 +2072,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = &vmbus_dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..588ba847f0cc 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,19 +13,28 @@
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "irq_remapping.h"
 
 #ifdef CONFIG_IRQ_REMAP
 
+int hyperv_swiotlb __read_mostly;
+
 /*
  * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
  * Redirection Table. Hyper-V exposes one single IO-APIC and so define
@@ -36,6 +45,10 @@
 static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
 static struct irq_domain *ioapic_ir_domain;
 
+static unsigned long hyperv_io_tlb_start, *hyperv_io_tlb_end; 
+static unsigned long hyperv_io_tlb_nslabs, hyperv_io_tlb_size;
+static void *hyperv_io_tlb_remap;
+
 static int hyperv_ir_set_affinity(struct irq_data *data,
const struct cpumask *mask, bool force)
 {
@@ -337,4 +350,118 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
.free = hyperv_root_irq_remapping_free,
 };
 
+static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
+
+   if (!hv_is_isolation_supported())
+   return phys;
+
+   map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
+attrs);
+   if (map == (phys_addr_t)DMA_MAPPING_ERROR)
+   return DMA_MAPPING_ERROR;
+
+   return map;
+}
+
+static void hyperv_unmap_page(struct device *dev, dma_addr_t dev_addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+   if (!hv_is_isolation_supported())
+   return;
+
+   swiotlb_tbl_unmap_single(dev, dev_addr, size, HV_HYP_PAGE_SIZE, dir,
+   attrs);
+}
+
+int __init hyperv_swiotlb_init(void)
+{
+   unsigned long bytes;
+   void *vstart = 0;
+
+   bytes = 200 * 1024 * 1024;
+   vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
+   hyperv_io_tlb_nslabs = bytes >> IO_TLB_SHIFT;
+   hyperv_io_tlb_size = bytes;
+
+   if (!vstart) {
+   pr_warn("Fail to allocate swiotlb.\n");
+   return -ENOMEM;
+   }
+
+   hyperv_io_tlb_start = virt_to_phys(vstart);
+   if (!hyperv_io_tlb_start)
+   panic("%s: Failed to allocate %lu bytes align=0x%lx.\n",
+ __func__, PAGE_ALIGN(bytes), PAGE_SIZE);
+
+   if (swiotlb_init_with_tbl(vstart, hyperv_io_tlb_nslabs, 1))
+ 

[Resend RFC PATCH V2 12/12] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb fucntion to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 67 +-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2e4fa77445fd..d271578b1811 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -414,6 +416,11 @@ static void storvsc_on_channel_callback(void *context);
 #define STORVSC_IDE_MAX_TARGETS1
 #define STORVSC_IDE_MAX_CHANNELS   1
 
+struct dma_range {
+   dma_addr_t dma;
+   u32 mapping_size;
+};
+
 struct storvsc_cmd_request {
struct scsi_cmnd *cmd;
 
@@ -427,6 +434,8 @@ struct storvsc_cmd_request {
u32 payload_sz;
 
struct vstor_packet vstor_packet;
+   u32 hvpg_count;
+   struct dma_range *dma_range;
 };
 
 
@@ -1236,6 +1245,7 @@ static void storvsc_on_channel_callback(void *context)
const struct vmpacket_descriptor *desc;
struct hv_device *device;
struct storvsc_device *stor_device;
+   int i;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1249,6 +1259,8 @@ static void storvsc_on_channel_callback(void *context)
foreach_vmbus_pkt(desc, channel) {
void *packet = hv_pkt_data(desc);
struct storvsc_cmd_request *request;
+   enum dma_data_direction dir;
+   u32 attrs;
u64 cmd_rqst;
 
cmd_rqst = vmbus_request_addr(&channel->requestor,
@@ -1261,6 +1273,22 @@ static void storvsc_on_channel_callback(void *context)
 
request = (struct storvsc_cmd_request *)(unsigned long)cmd_rqst;
 
+   if (request->vstor_packet.vm_srb.data_in == READ_TYPE)
+   dir = DMA_FROM_DEVICE;
+else
+   dir = DMA_TO_DEVICE;
+
+   if (request->dma_range) {
+   for (i = 0; i < request->hvpg_count; i++)
+   dma_unmap_page_attrs(&device->device,
+   request->dma_range[i].dma,
+   
request->dma_range[i].mapping_size,
+   
request->vstor_packet.vm_srb.data_in
+== READ_TYPE ?
+   DMA_FROM_DEVICE : 
DMA_TO_DEVICE, attrs);
+   kfree(request->dma_range);
+   }
+
if (request == &stor_device->init_request ||
request == &stor_device->reset_request) {
memcpy(&request->vstor_packet, packet,
@@ -1682,8 +1710,10 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct vmscsi_request *vm_srb;
struct scatterlist *cur_sgl;
struct vmbus_packet_mpb_array  *payload;
+   enum dma_data_direction dir;
u32 payload_sz;
u32 length;
+   u32 attrs;
 
if (vmstor_proto_version <= VMSTOR_PROTO_VERSION_WIN8) {
/*
@@ -1722,14 +1752,17 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
case DMA_TO_DEVICE:
vm_srb->data_in = WRITE_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_DATA_OUT;
+   dir = DMA_TO_DEVICE;
break;
case DMA_FROM_DEVICE:
vm_srb->data_in = READ_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_DATA_IN;
+   dir = DMA_FROM_DEVICE;
break;
case DMA_NONE:
vm_srb->data_in = UNKNOWN_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_NO_DATA_TRANSFER;
+   dir = DMA_NONE;
break;
default:
/*
@@ -1786,6 +1819,12 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
hvpgoff = sgl->offset >> HV_HYP_PAGE_SHIFT;
 
cur_sgl = sgl;
+
+   cmd_request->dma_range = kzalloc(sizeof(struct dma_range) * 
hvpg_count,
+ GFP_ATOMIC);
+   if (!cmd_request->dma_range)
+   return -ENOMEM;
+
for (i = 0; i < hvpg_count; i++) {
/*
   

[Resend RFC PATCH V2 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb fucntion to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |  11 +++
 drivers/net/hyperv/netvsc.c   | 137 --
 drivers/net/hyperv/rndis_filter.c |   3 +
 3 files changed, 144 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 2a87cfa27ac0..d85f811238c7 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -130,6 +130,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1026,6 +1027,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1034,6 +1036,8 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
+   u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
u32 send_section_size;
@@ -1715,4 +1719,11 @@ struct rndis_message {
 #define TRANSPORT_INFO_IPV6_TCP 0x10
 #define TRANSPORT_INFO_IPV6_UDP 0x20
 
+struct dma_range {
+   dma_addr_t dma;
+   u32 mapping_size;
+};
+
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 2353623259f3..1a5f5be4eeea 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -26,6 +26,7 @@
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
+#include "../../hv/hyperv_vmbus.h"
 
 /*
  * Switch the data path from the synthetic interface to the VF
@@ -119,8 +120,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   iounmap(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   iounmap(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -302,9 +316,12 @@ static int netvsc_init_buf(struct hv_device *device,
struct nvsp_1_message_send_receive_buffer_complete *resp;
struct net_device *ndev = hv_get_drvdata(device);
struct nvsp_message *init_packet;
+   struct vm_struct *area;
+   u64 extra_phys;
unsigned int buf_size;
+   unsigned long vaddr;
size_t map_words;
-   int ret = 0;
+   int ret = 0, i;
 
/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -340,6 +357,27 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   area = get_vm_area(buf_size, VM_IOREMAP);
+   if (!area)
+   goto cleanup;
+
+   vaddr = (unsigned long)area->addr;
+   for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
+   extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * 
HV_HYP_PAGE_SIZE)
+   << HV_HYP_PAGE_SHIFT) + 
ms_hyperv.shared_gpa_boundary;
+   ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
+  vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
+  extra_phys, PAGE_KERNEL_IO);
+   }
+
+   if (ret)
+   goto cleanup;
+
+   net_device->recv_original_buf = net_device->recv_buf;
+   net_device->recv_buf = (void *)vaddr;
+   }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -432,6 +470,28 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   area = get_vm_area(buf_size, VM_IOREMAP);
+   if (!

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-14 Thread David Coe

Hi Suravee!

I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 
partly to check my mailer's 'mangling' hadn't also reached the code!


There are 3 sets of results in the attachment, all for the Ryzen 2400G. 
The as-distributed kernel already incorporates your IOMMU RFCv3 patch.


A. As-distributed kernel (cold boot)
   >5 retries, so no IOMMU read/write capability, no amd_iommu events.

B. As-distributed kernel (warm boot)
   <5 retries, amd_iommu running stats show large numbers as before.

C. Revert+Update kernel
   amd_iommu events listed and also show large hit/miss numbers.

In due course, I'll load the new (revert+update) kernel on the 4700G but 
won't overload your mail-box unless something unusual turns up.


Best regards,

--
David
A. As Supplied - Cold Boot
**

$ sudo dmesg | grep IOMMU
[0.710610] pci :00:00.2: AMD-Vi: Unable to read/write to IOMMU perf 
counter.
[0.714365] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[0.984616] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 

$ sudo perf list | grep iommu
  intel_iommu:bounce_map_sg  [Tracepoint event]
  intel_iommu:bounce_map_single  [Tracepoint event]
  intel_iommu:bounce_unmap_single[Tracepoint event]
  intel_iommu:map_sg [Tracepoint event]
  intel_iommu:map_single [Tracepoint event]
  intel_iommu:unmap_sg   [Tracepoint event]
  intel_iommu:unmap_single   [Tracepoint event]
  iommu:add_device_to_group  [Tracepoint event]
  iommu:attach_device_to_domain  [Tracepoint event]
  iommu:detach_device_from_domain[Tracepoint event]
  iommu:io_page_fault[Tracepoint event]
  iommu:map  [Tracepoint event]
  iommu:remove_device_from_group [Tracepoint event]
  iommu:unmap[Tracepoint event]

No amd_iommu events listed.


B. As Supplied - Warm Boot
**

$ sudo dmesg | grep IOMMU
[0.515523] pci :00:00.2: AMD-Vi: IOMMU performance counters supported
[0.519236] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[0.520549] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[0.795781] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 

$ sudo perf stat -e 'amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, 
amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, 
amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, 
amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, 
amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, 
amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, 
amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, 
amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, 
amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, 
amd_iommu_0/vapic_int_non_guest/' sleep 10

Performance counter stats for 'system wide':

  0   amd_iommu_0/cmd_processed/   (33.39%)
  0   amd_iommu_0/cmd_processed_inv/   (33.06%)
  0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/(33.58%)
842,245,888,947,849   amd_iommu_0/int_dte_hit/ (33.42%)
848,982,159,636,118   amd_iommu_0/int_dte_mis/ (33.15%)
835,698,854,752,581   amd_iommu_0/mem_dte_hit/ (33.68%)
839,060,819,932,270   amd_iommu_0/mem_dte_mis/ (33.55%)
  0   amd_iommu_0/mem_iommu_tlb_pde_hit/   (33.44%)
837,231,240,047,576   amd_iommu_0/mem_iommu_tlb_pde_mis/   (33.62%)
842,688,371,629,123   amd_iommu_0/mem_iommu_tlb_pte_hit/   (33.40%)
851,647,568,857,291   amd_iommu_0/mem_iommu_tlb_pte_mis/   (33.05%)
  0   amd_iommu_0/mem_pass_excl/   (33.30%)
  0   amd_iommu_0/mem_pass_pretrans/   (33.36%)
852,801,037,224,491   amd_iommu_0/mem_pass_untrans/(33.01%)
  0   amd_iommu_0/mem_target_abort/(33.50%)
 46,371amd_iommu_0/mem_trans_total/(33.28%)
  0   amd_iommu_0/page_tbl_read_gst/   (33.00%)
  1,663amd_iommu_0/page_tbl_read_nst/  (33.55%)
 17amd_iommu_0/page_tbl_read_tot/  (33.38%)
  0   amd_iommu_0/smi_blk/ (33.28%)
  0   amd_iommu_0/smi_recv/(33.49%)
  0   amd_iommu_0/tlb_inv/ (33.32%)
  0   amd_iommu_0/vapic_int_guest/ (32.96%)
318   amd_iommu_0/

Re: [Resend RFC PATCH V2 03/12] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-04-14 Thread Christoph Hellwig
> +/*
> + * hv_set_mem_host_visibility - Set host visibility for specified memory.
> + */

I don't think this comment really clarifies anything over the function
name.  What is 'host visibility'

> +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)

Should size be a size_t?
Should visibility be an enum of some kind?

> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)

Not sure what this does either.

> + local_irq_save(flags);
> + input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)

Is there a chance we could find a shorter but still descriptive
name for this variable?  Why do we need the cast?

> +#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
> +#define VMBUS_PAGE_VISIBLE_READ_WRITE 
> (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)

pointlessly overlong line.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb

2021-04-14 Thread Christoph Hellwig
> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);

Just curious, who is going to use all these exports?  These seems like
extremely low-level functionality.  Isn't there a way to build a more
useful higher level API?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-14 Thread Christoph Hellwig
On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> UIO HV driver should not load in the isolation VM for security reason.
> Return ENOTSUPP in the hv_uio_probe() in the isolation VM.

What is the "security reason"?  After all a user can simply patch the
code and just load it anyway..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-14 Thread Greg KH
On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> UIO HV driver should not load in the isolation VM for security reason.
> Return ENOTSUPP in the hv_uio_probe() in the isolation VM.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/uio/uio_hv_generic.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
> index 0330ba99730e..678b021d66f8 100644
> --- a/drivers/uio/uio_hv_generic.c
> +++ b/drivers/uio/uio_hv_generic.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "../hv/hyperv_vmbus.h"
>  
> @@ -241,6 +242,10 @@ hv_uio_probe(struct hv_device *dev,
>   void *ring_buffer;
>   int ret;
>  
> + /* UIO driver should not be loaded in the isolation VM.*/
> + if (hv_is_isolation_supported())
> + return -ENOTSUPP;
> + 
>   /* Communicating with host has to be via shared memory not hypercall */
>   if (!channel->offermsg.monitor_allocated) {
>   dev_err(&dev->device, "vmbus channel requires hypercall\n");
> -- 
> 2.25.1
> 

Again you send out known-wrong patches?

:(
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 10/12] HV/IOMMU: Add Hyper-V dma ops support

2021-04-14 Thread Christoph Hellwig
> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
> +   unsigned long offset, size_t size,
> +   enum dma_data_direction dir,
> +   unsigned long attrs)
> +{
> + phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
> +
> + if (!hv_is_isolation_supported())
> + return phys;
> +
> + map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
> +  attrs);
> + if (map == (phys_addr_t)DMA_MAPPING_ERROR)
> + return DMA_MAPPING_ERROR;
> +
> + return map;
> +}

This largerly duplicates what dma-direct + swiotlb does.  Please use
force_dma_unencrypted to force bounce buffering and just use the generic
code.

> + if (hv_isolation_type_snp()) {
> + ret = hv_set_mem_host_visibility(
> + phys_to_virt(hyperv_io_tlb_start),
> + hyperv_io_tlb_size,
> + VMBUS_PAGE_VISIBLE_READ_WRITE);
> + if (ret)
> + panic("%s: Fail to mark Hyper-v swiotlb buffer visible 
> to host. err=%d\n",
> +   __func__, ret);
> +
> + hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start
> + + ms_hyperv.shared_gpa_boundary,
> + hyperv_io_tlb_size);
> + if (!hyperv_io_tlb_remap)
> + panic("%s: Fail to remap io tlb.\n", __func__);
> +
> + memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);
> + swiotlb_set_bounce_remap(hyperv_io_tlb_remap);

And this really needs to go into a common hook where we currently just
call set_memory_decrypted so that all the different schemes for these
trusted VMs (we have about half a dozen now) can share code rather than
reinventing it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-04-14 Thread Christoph Hellwig
> +struct dma_range {
> + dma_addr_t dma;
> + u32 mapping_size;
> +};

That's a rather generic name that is bound to create a conflict sooner
or later.

>  #include "hyperv_net.h"
>  #include "netvsc_trace.h"
> +#include "../../hv/hyperv_vmbus.h"

Please move public interfaces out of the private header rather than doing
this.

> + if (hv_isolation_type_snp()) {
> + area = get_vm_area(buf_size, VM_IOREMAP);

Err, no.  get_vm_area is private a for a reason.

> + if (!area)
> + goto cleanup;
> +
> + vaddr = (unsigned long)area->addr;
> + for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
> + extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * 
> HV_HYP_PAGE_SIZE)
> + << HV_HYP_PAGE_SHIFT) + 
> ms_hyperv.shared_gpa_boundary;
> + ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
> +vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
> +extra_phys, PAGE_KERNEL_IO);
> + }
> +
> + if (ret)
> + goto cleanup;

And this is not something a driver should ever do.  I think you are badly
reimplementing functionality that should be in the dma coherent allocator
here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 12/12] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-04-14 Thread Christoph Hellwig
Same comments as for the net driver.  In addition to all the poking
into kernel internals the fact that this is duplicated in the two main
drivers should have been a huge red flag.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-14 Thread Stephen Hemminger
On Wed, 14 Apr 2021 17:45:51 +0200
Greg KH  wrote:

> On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote:
> > From: Tianyu Lan 
> > 
> > UIO HV driver should not load in the isolation VM for security reason.
> > Return ENOTSUPP in the hv_uio_probe() in the isolation VM.
> > 
> > Signed-off-by: Tianyu Lan 

This is debatable, in isolation VM's shouldn't userspace take responsibility
to validate host communication. If that is an issue please participate with
the DPDK community (main user of this) to make sure netvsc userspace driver
has the required checks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] iommu: Move IOVA power-of-2 roundup into allocator

2021-04-14 Thread John Garry

On 06/04/2021 17:54, John Garry wrote:

Hi Robin,



Sorry if the phrasing was unclear there - the allusion to default 
domains is new, it just occurred to me that what we do there is in 
fact fairly close to what I've suggested previously for this. In that 
case, we have a global policy set by the command line, which *can* be 
overridden per-domain via sysfs at runtime, provided the user is 
willing to tear the whole thing down. Using a similar approach here 
would give a fair degree of flexibility but still mean that changes 
never have to be made dynamically to a live domain.


So are you saying that we can handle it similar to how we now can handle 
changing default domain for an IOMMU group via sysfs? If so, that just 
is not practical here. Reason being that this particular DMA engine 
provides the block device giving / mount point, so if we unbind the 
driver, we lose / mount point.


And I am not sure if the end user would even know how to set such a 
tunable. Or, in this case, why the end user would not want the optimized 
range configured always.


I'd still rather if the device driver could provide info which can be 
used to configure this before or during probing.


As a new solution, how about do both of these:
a. Add a per-IOMMU group sysfs file to set this tunable. Works same as 
how we change the default domain, and has all the same 
restrictions/steps. I think that this is what you are already suggesting.
b. Provide a DMA mapping API to set this value, similar to this current 
series. In the IOMMU backend for that API, we record a new range value 
and return -EPROBE_DEFER when successful. In the reprobe we reset the 
default domain for the devices' IOMMU group, with the IOVA domain rcache 
range configured as previously requested. Again, allocating the new 
default domain is similar to how we change default domain type today.
This means that we don't play with a live domain. Downside is that we 
need to defer the probe.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2][next] iommu/vt-d: Fix out-bounds-warning in intel_svm_page_response()

2021-04-14 Thread Gustavo A. R. Silva
Replace a couple of calls to memcpy() with simple assignments in order
to fix the following out-of-bounds warning:

drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from the 
object at 'desc' is out of the bounds of referenced subobject 'qw2' with type 
'long long unsigned int' at offset 16 [-Warray-bounds]

The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of &desc.qw2 and &resp.qw2, respectively.

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva 
---
Changes in v2:
 - Fix another instance of this same issue in prq_event_thread().

 drivers/iommu/intel/svm.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5165cea90421..332365ec3195 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1020,9 +1020,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)
resp.qw2 = 0;
resp.qw3 = 0;
 
-   if (req->priv_data_present)
-   memcpy(&resp.qw2, req->priv_data,
-  sizeof(req->priv_data));
+   if (req->priv_data_present) {
+   resp.qw2 = req->priv_data[0];
+   resp.qw3 = req->priv_data[1];
+   }
qi_submit_sync(iommu, &resp, 1, 0);
}
 prq_advance:
@@ -1194,9 +1195,10 @@ int intel_svm_page_response(struct device *dev,
desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
desc.qw2 = 0;
desc.qw3 = 0;
-   if (private_present)
-   memcpy(&desc.qw2, prm->private_data,
-  sizeof(prm->private_data));
+   if (private_present) {
+   desc.qw2 = prm->private_data[0];
+   desc.qw3 = prm->private_data[1];
+   }
 
qi_submit_sync(iommu, &desc, 1, 0);
}
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH][next] iommu/vt-d: Fix out-bounds-warning in intel_svm_page_response()

2021-04-14 Thread Gustavo A. R. Silva
Hi Balou,

On 4/14/21 00:24, Lu Baolu wrote:
> Hi Gustavo,
> 
> On 4/14/21 3:54 AM, Gustavo A. R. Silva wrote:
>> Replace call to memcpy() with just a couple of simple assignments in
>> order to fix the following out-of-bounds warning:
>>
>> drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from the 
>> object at 'desc' is out of the bounds of referenced subobject 'qw2' with type
>> 'long long unsigned int' at offset 16 [-Warray-bounds]
>>
>> The problem is that the original code is trying to copy data into a
>> couple of struct members adjacent to each other in a single call to
>> memcpy(). This causes a legitimate compiler warning because memcpy()
>> overruns the length of &desc.qw2.
>>
>> This helps with the ongoing efforts to globally enable -Warray-bounds
>> and get us closer to being able to tighten the FORTIFY_SOURCE routines
>> on memcpy().
>>
>> Link: https://github.com/KSPP/linux/issues/109
>> Signed-off-by: Gustavo A. R. Silva 
>> ---
>>   drivers/iommu/intel/svm.c | 7 ---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
>> index 5165cea90421..65909f504c50 100644
>> --- a/drivers/iommu/intel/svm.c
>> +++ b/drivers/iommu/intel/svm.c
>> @@ -1194,9 +1194,10 @@ int intel_svm_page_response(struct device *dev,
>>   desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
>>   desc.qw2 = 0;
>>   desc.qw3 = 0;
>> -    if (private_present)
>> -    memcpy(&desc.qw2, prm->private_data,
>> -   sizeof(prm->private_data));
> 
> The same memcpy() is used in multiple places in this file. Did they
> compile the same warnings? Or there are multiple patches to fix them
> one by one?

I just see one more instance of this same case:

1023 if (req->priv_data_present)
1024 memcpy(&resp.qw2, req->priv_data,
1025sizeof(req->priv_data));

I missed it and I'll address it in v2. Do you see another one?

Thanks for the feedback!
--
Gustavo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-14 Thread David Coe

Hi again!

For completeness, I'm attaching results for the revert+update patch 
running the Ubuntu 21.04β kernel 5.11.0-14 on a Ryzen 4700U laptop.


The enormous amd_iommu running stats aren't always there, as they nearly 
always are on the the 2400G desktop, but they do turn up (depending on 
what the machine's been doing).


Be very interested in your thoughts on their relevance!

Best regards,

--
David
$ sudo dmesg | grep IOMMU
[0.498593] pci :00:00.2: AMD-Vi: IOMMU performance counters supported
[0.500507] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[0.502011] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[1.113195] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 


$ sudo perf list | grep iommu
  amd_iommu_0/cmd_processed/ [Kernel PMU event]
  amd_iommu_0/cmd_processed_inv/ [Kernel PMU event]
  amd_iommu_0/ign_rd_wr_mmio_1ff8h/  [Kernel PMU event]
  amd_iommu_0/int_dte_hit/   [Kernel PMU event]
  amd_iommu_0/int_dte_mis/   [Kernel PMU event]
  amd_iommu_0/mem_dte_hit/   [Kernel PMU event]
  amd_iommu_0/mem_dte_mis/   [Kernel PMU event]
  amd_iommu_0/mem_iommu_tlb_pde_hit/ [Kernel PMU event]
  amd_iommu_0/mem_iommu_tlb_pde_mis/ [Kernel PMU event]
  amd_iommu_0/mem_iommu_tlb_pte_hit/ [Kernel PMU event]
  amd_iommu_0/mem_iommu_tlb_pte_mis/ [Kernel PMU event]
  amd_iommu_0/mem_pass_excl/ [Kernel PMU event]
  amd_iommu_0/mem_pass_pretrans/ [Kernel PMU event]
  amd_iommu_0/mem_pass_untrans/  [Kernel PMU event]
  amd_iommu_0/mem_target_abort/  [Kernel PMU event]
  amd_iommu_0/mem_trans_total/   [Kernel PMU event]
  amd_iommu_0/page_tbl_read_gst/ [Kernel PMU event]
  amd_iommu_0/page_tbl_read_nst/ [Kernel PMU event]
  amd_iommu_0/page_tbl_read_tot/ [Kernel PMU event]
  amd_iommu_0/smi_blk/   [Kernel PMU event]
  amd_iommu_0/smi_recv/  [Kernel PMU event]
  amd_iommu_0/tlb_inv/   [Kernel PMU event]
  amd_iommu_0/vapic_int_guest/   [Kernel PMU event]
  amd_iommu_0/vapic_int_non_guest/   [Kernel PMU event]
  intel_iommu:bounce_map_sg  [Tracepoint event]
  intel_iommu:bounce_map_single  [Tracepoint event]
  intel_iommu:bounce_unmap_single[Tracepoint event]
  intel_iommu:map_sg [Tracepoint event]
  intel_iommu:map_single [Tracepoint event]
  intel_iommu:unmap_sg   [Tracepoint event]
  intel_iommu:unmap_single   [Tracepoint event]
  iommu:add_device_to_group  [Tracepoint event]
  iommu:attach_device_to_domain  [Tracepoint event]
  iommu:detach_device_from_domain[Tracepoint event]
  iommu:io_page_fault[Tracepoint event]
  iommu:map  [Tracepoint event]
  iommu:remove_device_from_group [Tracepoint event]
  iommu:unmap[Tracepoint event]

$ sudo perf stat -e 'amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, 
amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, 
amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, 
amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, 
amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, 
amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, 
amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, 
amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, 
amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, 
amd_iommu_0/vapic_int_non_guest/' sleep 10

Performance counter stats for 'system wide':

30  amd_iommu_0/cmd_processed/ (33.31%)
17   amd_iommu_0/cmd_processed_inv/(33.34%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%)
   374   amd_iommu_0/int_dte_hit/  (33.39%)
29   amd_iommu_0/int_dte_mis/  (33.44%)
   394   amd_iommu_0/mem_dte_hit/  (33.46%)
 9,117   amd_iommu_0/mem_dte_mis/  (33.45%)
 5   amd_iommu_0/mem_iommu_tlb_pde_hit/(33.46%)
   819   amd_iommu_0/mem_iommu_tlb_pde_mis/(33.42%)
 2   amd_iommu_0/mem_iommu_tlb_pte

[PATCH v2 0/2] Simplify and restrict IOMMU SVA APIs

2021-04-14 Thread Jacob Pan
A couple of small changes to simplify and restrict SVA APIs. The motivation
is to make PASID allocation palatable for cgroup consumptions. Misc cgroup
is merged for v5.13, it can be extended for IOASID as another scalar
resource.

I have not tested on ARM platforms due to availability. Would appreciate
if someone could help with the testing on ARM.

Thanks,

Jacob

ChangeLog:
V2
- retained mm argument in iommu_sva_alloc_pasid()
- keep generic supervisor flag separated from vt-d's SRE
- move flag declaration out of CONFIG_IOMMU_API


Jacob Pan (2):
  iommu/sva: Tighten SVA bind API with explicit flags
  iommu/sva: Remove mm parameter from SVA bind API

 drivers/dma/idxd/cdev.c   |  2 +-
 drivers/dma/idxd/init.c   |  7 +++---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  4 ++--
 drivers/iommu/intel/svm.c | 14 ---
 drivers/iommu/iommu-sva-lib.c | 11 +
 drivers/iommu/iommu.c | 23 +--
 drivers/misc/uacce/uacce.c|  2 +-
 include/linux/intel-iommu.h   |  2 +-
 include/linux/intel-svm.h | 17 ++
 include/linux/iommu.h | 20 
 11 files changed, 57 insertions(+), 50 deletions(-)


base-commit: e49d033bddf5b565044e2abe4241353959bc9120
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-14 Thread Jacob Pan
The mm parameter in iommu_sva_bind_device() is intended for privileged
process perform bind() on behalf of other processes. This use case has
yet to be materialized, let alone potential security implications of
adding kernel hooks without explicit user consent.
In addition, with the agreement that IOASID allocation shall be subject
cgroup limit. It will be inline with misc cgroup proposal if IOASID
allocation as part of the SVA bind is limited to the current task.

Link: https://lore.kernel.org/linux-iommu/20210303160205.151d114e@jacob-builder/
Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/cdev.c   |  2 +-
 drivers/dma/idxd/init.c   |  2 +-
 drivers/iommu/iommu-sva-lib.c | 11 +++
 drivers/iommu/iommu.c | 20 +---
 drivers/misc/uacce/uacce.c|  2 +-
 include/linux/iommu.h |  3 +--
 6 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 21ec82bc47b6..8c3347c8930c 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
filp->private_data = ctx;
 
if (device_pasid_enabled(idxd)) {
-   sva = iommu_sva_bind_device(dev, current->mm, 0);
+   sva = iommu_sva_bind_device(dev, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 82a0985ad6dc..a92fa625f3b5 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -305,7 +305,7 @@ static int idxd_enable_system_pasid(struct idxd_device 
*idxd)
 
flags = IOMMU_SVA_BIND_SUPERVISOR;
 
-   sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, flags);
+   sva = iommu_sva_bind_device(&idxd->pdev->dev, flags);
if (IS_ERR(sva)) {
dev_warn(&idxd->pdev->dev,
 "iommu sva bind failed: %ld\n", PTR_ERR(sva));
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index bd41405d34e9..6e3d1a010d47 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -12,13 +12,13 @@ static DECLARE_IOASID_SET(iommu_sva_pasid);
 
 /**
  * iommu_sva_alloc_pasid - Allocate a PASID for the mm
- * @mm: the mm
  * @min: minimum PASID value (inclusive)
  * @max: maximum PASID value (inclusive)
  *
- * Try to allocate a PASID for this mm, or take a reference to the existing one
- * provided it fits within the [@min, @max] range. On success the PASID is
- * available in mm->pasid, and must be released with iommu_sva_free_pasid().
+ * Try to allocate a PASID for the current mm, or take a reference to the
+ * existing one provided it fits within the [@min, @max] range. On success
+ * the PASID is available in the current mm->pasid, and must be released with
+ * iommu_sva_free_pasid().
  * @min must be greater than 0, because 0 indicates an unused mm->pasid.
  *
  * Returns 0 on success and < 0 on error.
@@ -28,6 +28,9 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, 
ioasid_t max)
int ret = 0;
ioasid_t pasid;
 
+   if (mm != current->mm)
+   return -EINVAL;
+
if (min == INVALID_IOASID || max == INVALID_IOASID ||
min == 0 || max < min)
return -EINVAL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index eefa541d8674..5bbc35c395a6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct kset *iommu_group_kset;
@@ -2959,15 +2960,14 @@ int iommu_aux_get_pasid(struct iommu_domain *domain, 
struct device *dev)
 EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
 
 /**
- * iommu_sva_bind_device() - Bind a process address space to a device
+ * iommu_sva_bind_device() - Bind the current process address space to a device
  * @dev: the device
- * @mm: the mm to bind, caller must hold a reference to it
  * @flags: options for the bind operation defined as IOMMU_SVA_BIND_*
  *
  * Create a bond between device and address space, allowing the device to 
access
  * the mm using the returned PASID. If a bond already exists between @device 
and
- * @mm, it is returned and an additional reference is taken. Caller must call
- * iommu_sva_unbind_device() to release each reference.
+ * the current mm, it is returned and an additional reference is taken. Caller
+ * must call iommu_sva_unbind_device() to release each reference.
  *
  * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
  * initialize the required SVA features.
@@ -2975,9 +2975,10 @@ EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
  * On error, returns an ERR_PTR value.
  */
 struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned in

[PATCH v2 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-04-14 Thread Jacob Pan
The void* drvdata parameter isn't really used in iommu_sva_bind_device()
API, the current IDXD code "borrows" the drvdata for a VT-d private flag
for supervisor SVA usage.

Supervisor/Privileged mode request is a generic feature. It should be
promoted from the VT-d vendor driver to the generic code.

This patch replaces void* drvdata with a unsigned int flags parameter
and adjusts callers accordingly.

Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/cdev.c   |  2 +-
 drivers/dma/idxd/init.c   |  7 +++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  4 ++--
 drivers/iommu/intel/svm.c | 14 --
 drivers/iommu/iommu.c |  9 ++---
 drivers/misc/uacce/uacce.c|  2 +-
 include/linux/intel-iommu.h   |  2 +-
 include/linux/intel-svm.h | 17 ++---
 include/linux/iommu.h | 19 ---
 10 files changed, 40 insertions(+), 41 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 0db9b82ed8cf..21ec82bc47b6 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
filp->private_data = ctx;
 
if (device_pasid_enabled(idxd)) {
-   sva = iommu_sva_bind_device(dev, current->mm, NULL);
+   sva = iommu_sva_bind_device(dev, current->mm, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 085a0c3b62c6..82a0985ad6dc 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -300,13 +299,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev)
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
+   unsigned int flags;
unsigned int pasid;
struct iommu_sva *sva;
 
-   flags = SVM_FLAG_SUPERVISOR_MODE;
+   flags = IOMMU_SVA_BIND_SUPERVISOR;
 
-   sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
+   sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, flags);
if (IS_ERR(sva)) {
dev_warn(&idxd->pdev->dev,
 "iommu sva bind failed: %ld\n", PTR_ERR(sva));
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..145ceb2fc5da 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -354,12 +354,15 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
 }
 
 struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
 {
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
+   if (flags)
+   return ERR_PTR(-EINVAL);
+
if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
return ERR_PTR(-EINVAL);
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817c967a..b971d4dcf090 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -711,7 +711,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
*master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
 struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
-   void *drvdata);
+   unsigned int flags);
 void arm_smmu_sva_unbind(struct iommu_sva *handle);
 u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
 void arm_smmu_sva_notifier_synchronize(void);
@@ -742,7 +742,7 @@ static inline int arm_smmu_master_disable_sva(struct 
arm_smmu_master *master)
 }
 
 static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
 {
return ERR_PTR(-ENODEV);
 }
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 574a7e657a9a..d4840821f7b5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -486,12 +486,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
} else
pasid_max

[PATCH v2] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-14 Thread Longpeng(Mike)
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.

1.mmap(4GB,MAP_HUGETLB)
2.
  while (1) {
   (a)DMA MAP   0,0xa
   (b)DMA UNMAP 0,0xa
   (c)DMA MAP   0,0xc000
 * DMA read IOVA 0 may failure here (Not present)
 * if the problem occurs.
   (d)DMA UNMAP 0,0xc000
  }

The page table(only focus on IOVA 0) after (a) is:
 PML4: 0x19db5c1003   entry:0x899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
   PDE: 0x1a30a72003  entry:0x89b39cacb000
PTE: 0x21d200803  entry:0x89b3b0a72000

The page table after (b) is:
 PML4: 0x19db5c1003   entry:0x899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
   PDE: 0x1a30a72003  entry:0x89b39cacb000
PTE: 0x0  entry:0x89b3b0a72000

The page table after (c) is:
 PML4: 0x19db5c1003   entry:0x899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
   PDE: 0x21d200883   entry:0x89b39cacb000 (*)

Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.

However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:

1. PDE: 0x1a30a72003
2. __domain_mapping
 dma_pte_free_pagetable
   Set the PDE entry to ZERO
 Set the PDE entry to 0x21d200883

So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.

Cc: David Woodhouse 
Cc: Lu Baolu 
Cc: Nadav Amit 
Cc: Alex Williamson 
Cc: Joerg Roedel 
Cc: Kevin Tian 
Cc: Gonglei (Arei) 

Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating 
superpage")
Cc:  # v3.0+
Link: 
https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/
Suggested-by: Lu Baolu 
Signed-off-by: Longpeng(Mike) 
---
v1 -> v2:
  - add Joerg
  - reconstruct the solution base on the Baolu's suggestion
---
 drivers/iommu/intel/iommu.c | 52 +
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..881c9f2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2289,6 +2289,41 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
return level;
 }
 
+/*
+ * Ensure that old small page tables are removed to make room for superpage(s).
+ * We're going to add new large pages, so make sure we don't remove their 
parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+unsigned long start_pfn,
+unsigned long end_pfn, int level)
+{
+   unsigned long lvl_pages = lvl_to_nr_pages(level);
+   struct dma_pte *pte = NULL;
+   int i;
+
+   while (start_pfn <= end_pfn) {
+   if (!pte)
+   pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+   if (dma_pte_present(pte)) {
+   dma_pte_free_pagetable(domain, start_pfn,
+  start_pfn + lvl_pages - 1,
+  level + 1);
+
+   for_each_domain_iommu(i, domain)
+   iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+   }
+
+   pte++;
+   start_pfn += lvl_pages;
+   if (first_pte_in_page(pte))
+   pte = NULL;
+   }
+}
+
 static int
 __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2329,22 +2364,11 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
-   unsigned long nr_superpages, end_pfn;
+   unsigned long end_pfn;
 
pteval |= DMA_PTE_LARGE_PAGE;
-   lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
-   nr_superpages = nr_pages / lvl_pages;
-   end_pfn = iov_pfn + nr_superpages * lvl_pages - 
1;
-
-   /*
-* Ensure that old small page tables are
-* removed to make room for superpage(s).
-* We're adding new large pages, so make sure

Re: [PATCH] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-04-14 Thread chenxiang (M)

Hi Robin,


在 2021/4/14 21:17, Robin Murphy 写道:

On 2021-04-14 07:38, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change.


This seems not entirely unreasonable, but private_free_iova() really 
needs to be renamed (maybe something like remove_iova()?) if it's no 
longer actually freeing anything - otherwise it's just unnecessarily 
misleading.


Ok, i will rename function private_free_iova() in next version.



Robin.


Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index c669526f..292ed4a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -339,7 +339,6 @@ static void private_free_iova(struct iova_domain 
*iovad, struct iova *iova)

  assert_spin_locked(&iovad->iova_rbtree_lock);
  __cached_rbnode_delete_update(iovad, iova);
  rb_erase(&iova->node, &iovad->rbroot);
-free_iova_mem(iova);
  }
/**
@@ -376,6 +375,7 @@ __free_iova(struct iova_domain *iovad, struct 
iova *iova)

  spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
  private_free_iova(iovad, iova);
  spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(__free_iova);
  @@ -397,7 +397,7 @@ free_iova(struct iova_domain *iovad, unsigned 
long pfn)

  if (iova)
  private_free_iova(iovad, iova);
  spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
-
+free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(free_iova);
  @@ -746,6 +746,7 @@ iova_magazine_free_pfns(struct iova_magazine 
*mag, struct iova_domain *iovad)

  continue;
private_free_iova(iovad, iova);
+free_iova_mem(iova);
  }
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);



.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-04-14 Thread chenxiang
From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change. And also
rename private_free_iova() as remove_iova() because the function will not
free iova after that change.

Signed-off-by: Xiang Chen 
---
 drivers/iommu/iova.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7ecd5b..c10af23 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -412,12 +412,11 @@ private_find_iova(struct iova_domain *iovad, unsigned 
long pfn)
return NULL;
 }
 
-static void private_free_iova(struct iova_domain *iovad, struct iova *iova)
+static void remove_iova(struct iova_domain *iovad, struct iova *iova)
 {
assert_spin_locked(&iovad->iova_rbtree_lock);
__cached_rbnode_delete_update(iovad, iova);
rb_erase(&iova->node, &iovad->rbroot);
-   free_iova_mem(iova);
 }
 
 /**
@@ -452,8 +451,9 @@ __free_iova(struct iova_domain *iovad, struct iova *iova)
unsigned long flags;
 
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
-   private_free_iova(iovad, iova);
+   remove_iova(iovad, iova);
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+   free_iova_mem(iova);
 }
 EXPORT_SYMBOL_GPL(__free_iova);
 
@@ -473,9 +473,9 @@ free_iova(struct iova_domain *iovad, unsigned long pfn)
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
iova = private_find_iova(iovad, pfn);
if (iova)
-   private_free_iova(iovad, iova);
+   remove_iova(iovad, iova);
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
-
+   free_iova_mem(iova);
 }
 EXPORT_SYMBOL_GPL(free_iova);
 
@@ -825,7 +825,8 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct 
iova_domain *iovad)
if (WARN_ON(!iova))
continue;
 
-   private_free_iova(iovad, iova);
+   remove_iova(iovad, iova);
+   free_iova_mem(iova);
}
 
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
-- 
2.8.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-14 Thread Lu Baolu

Hi Jason,

On 4/14/21 7:26 PM, Jason Gunthorpe wrote:

On Wed, Apr 14, 2021 at 02:22:09PM +0800, Lu Baolu wrote:


I still worry about supervisor pasid allocation.

If we use iommu_sva_alloc_pasid() to allocate a supervisor pasid, which
mm should the pasid be set? I've ever thought about passing &init_mm to
iommu_sva_alloc_pasid(). But if you add "mm != current->mm", this seems
not to work. Or do you prefer a separated interface for supervisor pasid
allocation/free?


Without a mm_struct it is not SVA, so don't use SVA APIs for whatever
a 'supervisor pasid' is


The supervisor PASID has its mm_struct. The only difference is that the
device will set priv=1 in its DMA transactions with the PASID.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-14 Thread Lu Baolu

Hi Longpeng,

On 4/15/21 8:46 AM, Longpeng(Mike) wrote:

The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.

1.mmap(4GB,MAP_HUGETLB)
2.
   while (1) {
(a)DMA MAP   0,0xa
(b)DMA UNMAP 0,0xa
(c)DMA MAP   0,0xc000
  * DMA read IOVA 0 may failure here (Not present)
  * if the problem occurs.
(d)DMA UNMAP 0,0xc000
   }

The page table(only focus on IOVA 0) after (a) is:
  PML4: 0x19db5c1003   entry:0x899bdcd2f000
   PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
PDE: 0x1a30a72003  entry:0x89b39cacb000
 PTE: 0x21d200803  entry:0x89b3b0a72000

The page table after (b) is:
  PML4: 0x19db5c1003   entry:0x899bdcd2f000
   PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
PDE: 0x1a30a72003  entry:0x89b39cacb000
 PTE: 0x0  entry:0x89b3b0a72000

The page table after (c) is:
  PML4: 0x19db5c1003   entry:0x899bdcd2f000
   PDPE: 0x1a1cacb003  entry:0x89b35b5c1000
PDE: 0x21d200883   entry:0x89b39cacb000 (*)

Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.

However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:

1. PDE: 0x1a30a72003
2. __domain_mapping
  dma_pte_free_pagetable
Set the PDE entry to ZERO
  Set the PDE entry to 0x21d200883

So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.

Cc: David Woodhouse 
Cc: Lu Baolu 
Cc: Nadav Amit 
Cc: Alex Williamson 
Cc: Joerg Roedel 
Cc: Kevin Tian 
Cc: Gonglei (Arei) 

Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating 
superpage")
Cc:  # v3.0+
Link: 
https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/
Suggested-by: Lu Baolu 
Signed-off-by: Longpeng(Mike) 
---
v1 -> v2:
   - add Joerg
   - reconstruct the solution base on the Baolu's suggestion
---
  drivers/iommu/intel/iommu.c | 52 +
  1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..881c9f2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2289,6 +2289,41 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
return level;
  }
  
+/*

+ * Ensure that old small page tables are removed to make room for superpage(s).
+ * We're going to add new large pages, so make sure we don't remove their 
parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+unsigned long start_pfn,
+unsigned long end_pfn, int level)
+{
+   unsigned long lvl_pages = lvl_to_nr_pages(level);
+   struct dma_pte *pte = NULL;
+   int i;
+
+   while (start_pfn <= end_pfn) {
+   if (!pte)
+   pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+   if (dma_pte_present(pte)) {
+   dma_pte_free_pagetable(domain, start_pfn,
+  start_pfn + lvl_pages - 1,
+  level + 1);
+
+   for_each_domain_iommu(i, domain)
+   iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+   }
+
+   pte++;
+   start_pfn += lvl_pages;
+   if (first_pte_in_page(pte))
+   pte = NULL;
+   }
+}
+
  static int
  __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2329,22 +2364,11 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
-   unsigned long nr_superpages, end_pfn;
+   unsigned long end_pfn;
  
  pteval |= DMA_PTE_LARGE_PAGE;

-   lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
-   nr_superpages = nr_pages / lvl_pages;
-   end_pfn = iov_pfn + nr_superpages * lvl_pages - 
1;
-
-   /*
-* Ensure that old small page tables are
-* removed to make room for superpage(s).
-

Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-14 Thread Keqian Zhu
Hi Baolu,

Thanks for the review!

On 2021/4/14 15:00, Lu Baolu wrote:
> Hi Keqian,
> 
> On 4/13/21 4:54 PM, Keqian Zhu wrote:
>> Some types of IOMMU are capable of tracking DMA dirty log, such as
>> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
>> dirty log tracking framework in the IOMMU base layer.
>>
>> Three new essential interfaces are added, and we maintaince the status
>> of dirty log tracking in iommu_domain.
>> 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
>> 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
>> 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
>>
>> A new dev feature are added to indicate whether a specific type of
>> iommu hardware supports and its driver realizes them.
>>
>> Signed-off-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/iommu.c | 150 ++
>>   include/linux/iommu.h |  53 +++
>>   2 files changed, 203 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index d0b0a15dba84..667b2d6d2fc0 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1922,6 +1922,7 @@ static struct iommu_domain 
>> *__iommu_domain_alloc(struct bus_type *bus,
>>   domain->type = type;
>>   /* Assume all sizes by default; the driver may override this later */
>>   domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>> +mutex_init(&domain->switch_log_lock);
>> return domain;
>>   }
>> @@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain 
>> *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
>>   +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
>> +   unsigned long iova, size_t size, int prot)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +int ret;
>> +
>> +if (unlikely(!ops || !ops->switch_dirty_log))
>> +return -ENODEV;
>> +
>> +mutex_lock(&domain->switch_log_lock);
>> +if (enable && domain->dirty_log_tracking) {
>> +ret = -EBUSY;
>> +goto out;
>> +} else if (!enable && !domain->dirty_log_tracking) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +ret = ops->switch_dirty_log(domain, enable, iova, size, prot);
>> +if (ret)
>> +goto out;
>> +
>> +domain->dirty_log_tracking = enable;
>> +out:
>> +mutex_unlock(&domain->switch_log_lock);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
> 
> Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the
> difference between
> 
> iommu_switch_dirty_log(on) vs. iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)
> 
> iommu_switch_dirty_log(off) vs. 
> iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM)
Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable
are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use these
interfaces for it.

IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so we 
should
design it as not switchable. I will modify the commit message of patch#12, 
thanks!

> 
>> +
>> +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
>> + size_t size, unsigned long *bitmap,
>> + unsigned long base_iova, unsigned long bitmap_pgshift)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +unsigned int min_pagesz;
>> +size_t pgsize;
>> +int ret = 0;
>> +
>> +if (unlikely(!ops || !ops->sync_dirty_log))
>> +return -ENODEV;
>> +
>> +min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
>> +if (!IS_ALIGNED(iova | size, min_pagesz)) {
>> +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
>> +   iova, size, min_pagesz);
>> +return -EINVAL;
>> +}
>> +
>> +mutex_lock(&domain->switch_log_lock);
>> +if (!domain->dirty_log_tracking) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +while (size) {
>> +pgsize = iommu_pgsize(domain, iova, size);
>> +
>> +ret = ops->sync_dirty_log(domain, iova, pgsize,
>> +  bitmap, base_iova, bitmap_pgshift);
> 
> Any reason why do you want to do this in a per-4K page manner? This can
> lead to a lot of indirect calls and bad performance.
> 
> How about a sync_dirty_pages()?
The function name of iommu_pgsize() is a bit puzzling. Actually it will try to
compute the max size that fit into size, so the pgsize can be a large page size
even if the underlying mapping is 4K. The __iommu_unmap() also has a similar 
logic.

BRs,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-04-14 Thread Christoph Hellwig
On Wed, Apr 14, 2021 at 08:27:56AM -0700, Jacob Pan wrote:
>  static int idxd_enable_system_pasid(struct idxd_device *idxd)
>  {
> - int flags;
> + unsigned int flags;
>   unsigned int pasid;
>   struct iommu_sva *sva;
>  
> - flags = SVM_FLAG_SUPERVISOR_MODE;
> + flags = IOMMU_SVA_BIND_SUPERVISOR;
>  
> - sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
> + sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, flags);

Please also remove the now pointless flags variable.

> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int 
> flags)

Pleae avoid the pointless overly long line.

> -#define SVM_FLAG_GUEST_PASID (1<<3)
> +#define SVM_FLAG_GUEST_PASID (1<<2)

This flag is entirely unused, please just remove it in a prep patch
rather than renumbering it.

>  static inline struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void 
> *drvdata)
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int 
> flags)

Same overy long line here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-14 Thread Christoph Hellwig
>   *
>   * Returns 0 on success and < 0 on error.
> @@ -28,6 +28,9 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
> min, ioasid_t max)
>   int ret = 0;
>   ioasid_t pasid;
>  
> + if (mm != current->mm)
> + return -EINVAL;
> +

Why not remove the parameter entirely?

> @@ -2989,8 +2990,11 @@ iommu_sva_bind_device(struct device *dev, struct 
> mm_struct *mm, unsigned int fla
>   return ERR_PTR(-ENODEV);
>  
>   /* Supervisor SVA does not need the current mm */
> - if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && mm)
> - return ERR_PTR(-EINVAL);
> + if (!(flags & IOMMU_SVA_BIND_SUPERVISOR)) {
> + mm = get_task_mm(current);
> + if (!mm)
> + return ERR_PTR(-EINVAL);
> + }

I don't see why we need the reference.  I think we should just stop
passing the mm to ->sva_bind and let the low-level driver deal with
any reference to current->mm where needed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu