Re: [PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump

2018-06-21 Thread Baoquan He
On 06/21/18 at 08:12am, Tom Lendacky wrote:
> On 6/21/2018 3:39 AM, Baoquan He wrote:
> > On 06/21/18 at 01:42pm, lijiang wrote:
> >> 在 2018年06月21日 00:42, Tom Lendacky 写道:
> >>> On 6/16/2018 3:27 AM, Lianbo Jiang wrote:
>  In kdump mode, it will copy the device table of IOMMU from the old
>  device table, which is encrypted when SME is enabled in the first
>  kernel. So we must remap it in encrypted manner in order to be
>  automatically decrypted when we read.
> 
>  Signed-off-by: Lianbo Jiang 
>  ---
>  Some changes:
>  1. add some comments
>  2. clean compile warning.
> 
>   drivers/iommu/amd_iommu_init.c | 15 ++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
>  diff --git a/drivers/iommu/amd_iommu_init.c 
>  b/drivers/iommu/amd_iommu_init.c
>  index 904c575..a20af4c 100644
>  --- a/drivers/iommu/amd_iommu_init.c
>  +++ b/drivers/iommu/amd_iommu_init.c
>  @@ -889,11 +889,24 @@ static bool copy_device_table(void)
>   }
>   
>   old_devtb_phys = entry & PAGE_MASK;
>  +
>  +/*
>  + *  When sme enable in the first kernel, old_devtb_phys 
>  includes the
>  + *  memory encryption mask(sme_me_mask), we must remove the 
>  memory
>  + *  encryption mask to obtain the true physical address in 
>  kdump mode.
>  + */
>  +if (mem_encrypt_active() && is_kdump_kernel())
>  +old_devtb_phys = __sme_clr(old_devtb_phys);
>  +
> >>>
> >>> You can probably just use "if (is_kdump_kernel())" here, since memory
> >>> encryption is either on in both the first and second kernel or off in
> >>> both the first and second kernel.  At which point __sme_clr() will do
> >>> the proper thing.
> >>>
> >>> Actually, this needs to be done no matter what.  When doing either the
> >>> ioremap_encrypted() or the memremap(), the physical address should not
> >>> include the encryption bit/mask.
> >>>
> >>> Thanks,
> >>> Tom
> >>>
> >> Thanks for your comments. If we don't remove the memory encryption mask, 
> >> it will
> >> return false because the 'old_devtb_phys >= 0x1ULL' may become 
> >> true.
> > 
> > Lianbo, you may not get what Tom suggested. Tom means no matter what it
> > is, encrypted or not in 1st kernel, we need get pure physicall address,
> > and using below code is always right for both cases.
> > 
> > if (is_kdump_kernel())
> > old_devtb_phys = __sme_clr(old_devtb_phys);
> > 
> > And this is simpler. You even can add one line of code comment to say
> > like "Physical address w/o encryption mask is needed here."
> 
> Even simpler, there's no need to even check for is_kdump_kernel().  The
> __sme_clr() should always be done if the physical address is going to be
> used for some form of io or memory remapping.
> 
> So you could just change the existing:
> 
>   old_devtb_phys = entry & PAGE_MASK;
> 
> to:
> 
>   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;

Agree, this is even better.

> 
> >>
> >> Lianbo
>   if (old_devtb_phys >= 0x1ULL) {
>   pr_err("The address of old device table is above 4G, 
>  not trustworthy!\n");
>   return false;
>   }
>  -old_devtb = memremap(old_devtb_phys, dev_table_size, 
>  MEMREMAP_WB);
>  +old_devtb = (mem_encrypt_active() && is_kdump_kernel())
>  +? (__force void *)ioremap_encrypted(old_devtb_phys,
>  +dev_table_size)
>  +: memremap(old_devtb_phys, dev_table_size, 
>  MEMREMAP_WB);> +
>   if (!old_devtb)
>   return false;
>   
> 
> >>
> >> ___
> >> kexec mailing list
> >> ke...@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/4 V3] Allocate pages for kdump without encryption when SME is enabled

2018-06-21 Thread Baoquan He
On 06/21/18 at 09:27pm, lijiang wrote:
> 在 2018年06月21日 18:23, Baoquan He 写道:
> > On 06/21/18 at 01:06pm, lijiang wrote:
> >> 在 2018年06月21日 09:53, Baoquan He 写道:
> >>> On 06/16/18 at 04:27pm, Lianbo Jiang wrote:
>  When SME is enabled in the first kernel, we will allocate pages
>  for kdump without encryption in order to be able to boot the
>  second kernel in the same manner as kexec, which helps to keep
>  the same code style.
> 
>  Signed-off-by: Lianbo Jiang 
>  ---
>   kernel/kexec_core.c | 12 
>   1 file changed, 12 insertions(+)
> 
>  diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>  index 20fef1a..3c22a9b 100644
>  --- a/kernel/kexec_core.c
>  +++ b/kernel/kexec_core.c
>  @@ -471,6 +471,16 @@ static struct page 
>  *kimage_alloc_crash_control_pages(struct kimage *image,
>   }
>   }
>   
>  +if (pages) {
>  +unsigned int count, i;
>  +
>  +pages->mapping = NULL;
>  +set_page_private(pages, order);
>  +count = 1 << order;
>  +for (i = 0; i < count; i++)
>  +SetPageReserved(pages + i);
> >>>
> >>> I guess you might imitate the kexec case, however kexec get pages from
> >>> buddy. Crash pages are reserved in memblock, these codes might make no 
> >>> sense.
> >>>
> >> Thanks for your comments.
> >> We have changed the attribute of pages, so the original attribute of pages 
> >> will be
> >> restored when they free.
> > 
> > Hmm, you can check what kimage_free() is doing, and where
> > kimage->control_pages, dest_pages, unusable_pages is assigned. Do you
> > know where these original attribute of pages comes from and they are
> > used/needed in CRASH case, if you care about them?
> > 
> Originally, we want to have an opportunity to restore the previous attribute 
> of pages, that
> should be more better if the pages are remembered in 'image->control_pages'.

Again, please check who assigns value for 'image->control_pages'.

> If we remove these codes, it is also harmless for kdump, but it will become 
> strange, maybe
> someone could ask where to restore the previous attribute of pages.
> 
> Thanks.
> >>
>  +arch_kexec_post_alloc_pages(page_address(pages), 1 << 
>  order, 0);
>  +}
>   return pages;
>   }
>   
>  @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage 
>  *image,
>   result  = -ENOMEM;
>   goto out;
>   }
>  +arch_kexec_post_alloc_pages(page_address(page), 1, 0);
>   ptr = kmap(page);
>   ptr += maddr & ~PAGE_MASK;
>   mchunk = min_t(size_t, mbytes,
>  @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage 
>  *image,
>   result = copy_from_user(ptr, buf, uchunk);
>   kexec_flush_icache_page(page);
>   kunmap(page);
>  +arch_kexec_pre_free_pages(page_address(page), 1);
>   if (result) {
>   result = -EFAULT;
>   goto out;
>  -- 
>  2.9.5
> 
> 
>  ___
>  kexec mailing list
>  ke...@lists.infradead.org
>  http://lists.infradead.org/mailman/listinfo/kexec
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 3/5] iommu/virtio: Add probe request

2018-06-21 Thread Michael S. Tsirkin
On Thu, Jun 21, 2018 at 08:06:53PM +0100, Jean-Philippe Brucker wrote:
> When the device offers the probe feature, send a probe request for each
> device managed by the IOMMU. Extract RESV_MEM information. When we
> encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
> This will tell other subsystems that there is no need to map the MSI
> doorbell in the virtio-iommu, because MSIs bypass it.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c  | 149 --
>  include/uapi/linux/virtio_iommu.h |  37 
>  2 files changed, 180 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index ea0242d8624b..29ce9f4398ef 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -48,6 +48,7 @@ struct viommu_dev {
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
>   u8  domain_bits;
> + u32 probe_size;
>  };
>  
>  struct viommu_mapping {
> @@ -69,8 +70,10 @@ struct viommu_domain {
>  };
>  
>  struct viommu_endpoint {
> + struct device   *dev;
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
> + struct list_headresv_regions;
>  };
>  
>  struct viommu_request {
> @@ -121,6 +124,9 @@ static off_t viommu_get_req_offset(struct viommu_dev 
> *viommu,
>  {
>   size_t tail_size = sizeof(struct virtio_iommu_req_tail);
>  
> + if (req->type == VIRTIO_IOMMU_T_PROBE)
> + return len - viommu->probe_size - tail_size;
> +
>   return len - tail_size;
>  }
>  
> @@ -404,6 +410,103 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> +struct virtio_iommu_probe_resv_mem *mem,
> +size_t len)
> +{
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + u64 start = le64_to_cpu(mem->start);
> + u64 end = le64_to_cpu(mem->end);
> + size_t size = end - start + 1;
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + default:
> + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
> + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)
> + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
> +  mem->subtype);
> + /* Fall-through */
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + region = iommu_alloc_resv_region(start, size, 0,
> +  IOMMU_RESV_RESERVED);
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(start, size, prot,
> +  IOMMU_RESV_MSI);
> + break;
> + }
> +
> + list_add(&vdev->resv_regions, ®ion->list);
> + return 0;
> +}
> +
> +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device 
> *dev)
> +{
> + int ret;
> + u16 type, len;
> + size_t cur = 0;
> + size_t probe_len;
> + struct virtio_iommu_req_probe *probe;
> + struct virtio_iommu_probe_property *prop;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> +
> + if (!fwspec->num_ids)
> + return -EINVAL;
> +
> + probe_len = sizeof(*probe) + viommu->probe_size +
> + sizeof(struct virtio_iommu_req_tail);
> + probe = kzalloc(probe_len, GFP_KERNEL);
> + if (!probe)
> + return -ENOMEM;
> +
> + probe->head.type = VIRTIO_IOMMU_T_PROBE;
> + /*
> +  * For now, assume that properties of an endpoint that outputs multiple
> +  * IDs are consistent. Only probe the first one.
> +  */
> + probe->endpoint = cpu_to_le32(fwspec->ids[0]);
> +
> + ret = viommu_send_req_sync(viommu, probe, probe_len);
> + if (ret)
> + goto out_free;
> +
> + prop = (void *)probe->properties;
> + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
> +
> + while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> +cur < viommu->probe_size) {
> + len = le16_to_cpu(prop->length);
> +
> + switch (type) {
> + case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> + ret = viommu_add_resv_mem(vdev, (void *)prop->value, 
> len);
> + break;
> + default:
> + dev_dbg(dev, "unknown viommu prop 0x%x\n", type);
> + }
> +
> + if (ret)
> + dev_err(dev, "failed to parse viommu prop 0x%x\n", 
> type);
> +
> +   

Re: [PATCH v2 2/5] iommu: Add virtio-iommu driver

2018-06-21 Thread Michael S. Tsirkin
On Thu, Jun 21, 2018 at 08:06:52PM +0100, Jean-Philippe Brucker wrote:
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio-mmio transport without emulating
> page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
> requests.
> 
> The bulk of the code transforms calls coming from the IOMMU API into
> corresponding virtio requests. Mappings are kept in an interval tree
> instead of page tables.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  MAINTAINERS   |   6 +
>  drivers/iommu/Kconfig |  11 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu.c  | 929 ++
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_iommu.h | 117 
>  6 files changed, 1065 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 55c08968aaab..bfb09dfa8e02 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -15248,6 +15248,12 @@ S:   Maintained
>  F:   drivers/virtio/virtio_input.c
>  F:   include/uapi/linux/virtio_input.h
>  
> +VIRTIO IOMMU DRIVER
> +M:   Jean-Philippe Brucker 
> +S:   Maintained
> +F:   drivers/iommu/virtio-iommu.c
> +F:   include/uapi/linux/virtio_iommu.h
> +
>  VIRTUAL BOX GUEST DEVICE DRIVER
>  M:   Hans de Goede 
>  M:   Arnd Bergmann 

Please add the virtualization mailing list.

> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 958417f22020..820709383899 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -412,4 +412,15 @@ config QCOM_IOMMU
>   help
> Support for IOMMU on certain Qualcomm SoCs.
>  
> +config VIRTIO_IOMMU
> + bool "Virtio IOMMU driver"
> + depends on VIRTIO_MMIO=y
> + select IOMMU_API
> + select INTERVAL_TREE
> + select ARM_DMA_USE_IOMMU if ARM
> + help
> +   Para-virtualised IOMMU driver with virtio.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 244ad7913a81..b559c6ae81ea 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -33,3 +33,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index ..ea0242d8624b
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,929 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2018 Arm Limited
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MSI_IOVA_BASE0x800
> +#define MSI_IOVA_LENGTH  0x10
> +
> +#define VIOMMU_REQUEST_VQ0
> +#define VIOMMU_NR_VQS1
> +
> +#define VIOMMU_REQUEST_TIMEOUT   1 /* 10s */

Where does this come from? Do you absolutely have to have
an arbitrary timeout?

> +
> +struct viommu_dev {
> + struct iommu_device iommu;
> + struct device   *dev;
> + struct virtio_device*vdev;
> +
> + struct ida  domain_ids;
> +
> + struct virtqueue*vqs[VIOMMU_NR_VQS];
> + spinlock_t  request_lock;
> + struct list_headrequests;
> +
> + /* Device configuration */
> + struct iommu_domain_geometrygeometry;
> + u64 pgsize_bitmap;
> + u8  domain_bits;
> +};
> +
> +struct viommu_mapping {
> + phys_addr_t paddr;
> + struct interval_tree_node   iova;
> + u32 flags;
> +};
> +
> +struct viommu_domain {
> + struct iommu_domain domain;
> + struct viommu_dev   *viommu;
> + struct mutexmutex;
> + unsigned intid;
> +
> + spinlock_t  mappings_lock;
> + struct rb_root_cached   mappings;
> +
> + unsigned long   nr_endpoints;
> +};
> +
> +struct viommu_endpoint {
> + struct viommu_dev   *viommu;
> + struct viommu_domain*vdomain;
> +};
> +
> +struct viommu_request {
> + struct list_headlist;
> + void*writeback;
> + unsigned int  

[PATCH v2 5/5] vfio: Allow type-1 IOMMU instantiation for ARM

2018-06-21 Thread Jean-Philippe Brucker
ARM platforms may implement several kinds of IOMMUs (various SMMU or
SMMUv3 implementations, virtio-iommu). They are all type-1, so
automatically select VFIO_IOMMU_TYPE1 on ARM if IOMMU is selected.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/vfio/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index c84333eb5eb5..9de5ed38da83 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -21,7 +21,7 @@ config VFIO_VIRQFD
 menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
-   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
+   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
select ANON_INODES
help
  VFIO provides a framework for secure userspace device drivers.
-- 
2.17.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/5] iommu/virtio: Add event queue

2018-06-21 Thread Jean-Philippe Brucker
The event queue offers a way for the device to report access faults from
endpoints. It is implemented on virtqueue #1. Whenever the host needs to
signal a fault, it fills one of the buffers offered by the guest and
interrupts it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 116 +++---
 include/uapi/linux/virtio_iommu.h |  18 +
 2 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 29ce9f4398ef..c075404e97d6 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -29,7 +29,8 @@
 #define MSI_IOVA_LENGTH0x10
 
 #define VIOMMU_REQUEST_VQ  0
-#define VIOMMU_NR_VQS  1
+#define VIOMMU_EVENT_VQ1
+#define VIOMMU_NR_VQS  2
 
 #define VIOMMU_REQUEST_TIMEOUT 1 /* 10s */
 
@@ -43,6 +44,7 @@ struct viommu_dev {
struct virtqueue*vqs[VIOMMU_NR_VQS];
spinlock_t  request_lock;
struct list_headrequests;
+   void*evts;
 
/* Device configuration */
struct iommu_domain_geometrygeometry;
@@ -84,6 +86,15 @@ struct viommu_request {
charbuf[];
 };
 
+#define VIOMMU_FAULT_RESV_MASK 0xff00
+
+struct viommu_event {
+   union {
+   u32 head;
+   struct virtio_iommu_fault fault;
+   };
+};
+
 #define to_viommu_domain(domain)   \
container_of(domain, struct viommu_domain, domain)
 
@@ -507,6 +518,69 @@ static int viommu_probe_endpoint(struct viommu_dev 
*viommu, struct device *dev)
return ret;
 }
 
+static int viommu_fault_handler(struct viommu_dev *viommu,
+   struct virtio_iommu_fault *fault)
+{
+   char *reason_str;
+
+   u8 reason   = fault->reason;
+   u32 flags   = le32_to_cpu(fault->flags);
+   u32 endpoint= le32_to_cpu(fault->endpoint);
+   u64 address = le64_to_cpu(fault->address);
+
+   switch (reason) {
+   case VIRTIO_IOMMU_FAULT_R_DOMAIN:
+   reason_str = "domain";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_MAPPING:
+   reason_str = "page";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
+   default:
+   reason_str = "unknown";
+   break;
+   }
+
+   /* TODO: find EP by ID and report_iommu_fault */
+   if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
[%s%s%s]\n",
+   reason_str, endpoint, address,
+   flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
"");
+   else
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n",
+   reason_str, endpoint);
+   return 0;
+}
+
+static void viommu_event_handler(struct virtqueue *vq)
+{
+   int ret;
+   unsigned int len;
+   struct scatterlist sg[1];
+   struct viommu_event *evt;
+   struct viommu_dev *viommu = vq->vdev->priv;
+
+   while ((evt = virtqueue_get_buf(vq, &len)) != NULL) {
+   if (len > sizeof(*evt)) {
+   dev_err(viommu->dev,
+   "invalid event buffer (len %u != %zu)\n",
+   len, sizeof(*evt));
+   } else if (!(evt->head & VIOMMU_FAULT_RESV_MASK)) {
+   viommu_fault_handler(viommu, &evt->fault);
+   }
+
+   sg_init_one(sg, evt, sizeof(*evt));
+   ret = virtqueue_add_inbuf(vq, sg, 1, evt, GFP_ATOMIC);
+   if (ret)
+   dev_err(viommu->dev, "could not add event buffer\n");
+   }
+
+   if (!virtqueue_kick(vq))
+   dev_err(viommu->dev, "kick failed\n");
+}
+
 /* IOMMU API */
 
 static struct iommu_domain *viommu_domain_alloc(unsigned type)
@@ -893,16 +967,35 @@ static struct iommu_ops viommu_ops = {
 static int viommu_init_vqs(struct viommu_dev *viommu)
 {
struct virtio_device *vdev = dev_to_virtio(viommu->dev);
-   const char *name = "request";
-   void *ret;
+   const char *names[] = { "request", "event" };
+   vq_callback_t *callbacks[] = {
+   NULL, /* No async requests */
+   viommu_event_handler,
+   };
 
-   ret = virtio_find_single_vq(vdev, NULL, name);
-   if (IS_ERR(ret)) {
-   dev_err(viommu->dev, "cannot find VQ\n");
-   return PTR_ERR(ret);
-   }
+   return virtio_find_vqs(vdev, VIOMMU_NR_VQS, viommu->vqs, callbacks,
+ 

[PATCH v2 3/5] iommu/virtio: Add probe request

2018-06-21 Thread Jean-Philippe Brucker
When the device offers the probe feature, send a probe request for each
device managed by the IOMMU. Extract RESV_MEM information. When we
encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
This will tell other subsystems that there is no need to map the MSI
doorbell in the virtio-iommu, because MSIs bypass it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 149 --
 include/uapi/linux/virtio_iommu.h |  37 
 2 files changed, 180 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index ea0242d8624b..29ce9f4398ef 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -48,6 +48,7 @@ struct viommu_dev {
struct iommu_domain_geometrygeometry;
u64 pgsize_bitmap;
u8  domain_bits;
+   u32 probe_size;
 };
 
 struct viommu_mapping {
@@ -69,8 +70,10 @@ struct viommu_domain {
 };
 
 struct viommu_endpoint {
+   struct device   *dev;
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
+   struct list_headresv_regions;
 };
 
 struct viommu_request {
@@ -121,6 +124,9 @@ static off_t viommu_get_req_offset(struct viommu_dev 
*viommu,
 {
size_t tail_size = sizeof(struct virtio_iommu_req_tail);
 
+   if (req->type == VIRTIO_IOMMU_T_PROBE)
+   return len - viommu->probe_size - tail_size;
+
return len - tail_size;
 }
 
@@ -404,6 +410,103 @@ static int viommu_replay_mappings(struct viommu_domain 
*vdomain)
return ret;
 }
 
+static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
+  struct virtio_iommu_probe_resv_mem *mem,
+  size_t len)
+{
+   struct iommu_resv_region *region = NULL;
+   unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   u64 start = le64_to_cpu(mem->start);
+   u64 end = le64_to_cpu(mem->end);
+   size_t size = end - start + 1;
+
+   if (len < sizeof(*mem))
+   return -EINVAL;
+
+   switch (mem->subtype) {
+   default:
+   if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
+   mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)
+   dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
+mem->subtype);
+   /* Fall-through */
+   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+   region = iommu_alloc_resv_region(start, size, 0,
+IOMMU_RESV_RESERVED);
+   break;
+   case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+   region = iommu_alloc_resv_region(start, size, prot,
+IOMMU_RESV_MSI);
+   break;
+   }
+
+   list_add(&vdev->resv_regions, ®ion->list);
+   return 0;
+}
+
+static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev)
+{
+   int ret;
+   u16 type, len;
+   size_t cur = 0;
+   size_t probe_len;
+   struct virtio_iommu_req_probe *probe;
+   struct virtio_iommu_probe_property *prop;
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+
+   if (!fwspec->num_ids)
+   return -EINVAL;
+
+   probe_len = sizeof(*probe) + viommu->probe_size +
+   sizeof(struct virtio_iommu_req_tail);
+   probe = kzalloc(probe_len, GFP_KERNEL);
+   if (!probe)
+   return -ENOMEM;
+
+   probe->head.type = VIRTIO_IOMMU_T_PROBE;
+   /*
+* For now, assume that properties of an endpoint that outputs multiple
+* IDs are consistent. Only probe the first one.
+*/
+   probe->endpoint = cpu_to_le32(fwspec->ids[0]);
+
+   ret = viommu_send_req_sync(viommu, probe, probe_len);
+   if (ret)
+   goto out_free;
+
+   prop = (void *)probe->properties;
+   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
+
+   while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
+  cur < viommu->probe_size) {
+   len = le16_to_cpu(prop->length);
+
+   switch (type) {
+   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+   ret = viommu_add_resv_mem(vdev, (void *)prop->value, 
len);
+   break;
+   default:
+   dev_dbg(dev, "unknown viommu prop 0x%x\n", type);
+   }
+
+   if (ret)
+   dev_err(dev, "failed to parse viommu prop 0x%x\n", 
type);
+
+   cur += sizeof(*prop) + len;
+   if (cur >= viommu->probe_size)
+   break;
+
+   prop = (void *)probe->properties + cur;
+   type =

[PATCH v2 2/5] iommu: Add virtio-iommu driver

2018-06-21 Thread Jean-Philippe Brucker
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
 MAINTAINERS   |   6 +
 drivers/iommu/Kconfig |  11 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu.c  | 929 ++
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_iommu.h | 117 
 6 files changed, 1065 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 55c08968aaab..bfb09dfa8e02 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15248,6 +15248,12 @@ S: Maintained
 F: drivers/virtio/virtio_input.c
 F: include/uapi/linux/virtio_input.h
 
+VIRTIO IOMMU DRIVER
+M: Jean-Philippe Brucker 
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
 VIRTUAL BOX GUEST DEVICE DRIVER
 M: Hans de Goede 
 M: Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 958417f22020..820709383899 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -412,4 +412,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
 
+config VIRTIO_IOMMU
+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO=y
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 244ad7913a81..b559c6ae81ea 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -33,3 +33,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..ea0242d8624b
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,929 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 Arm Limited
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+#define VIOMMU_REQUEST_VQ  0
+#define VIOMMU_NR_VQS  1
+
+#define VIOMMU_REQUEST_TIMEOUT 1 /* 10s */
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vqs[VIOMMU_NR_VQS];
+   spinlock_t  request_lock;
+   struct list_headrequests;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   u32 flags;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   unsigned long   nr_endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct list_headlist;
+   void*writeback;
+   unsigned intwrite_offset;
+   unsigned intlen;
+   charbuf[];
+};
+
+#define to_viommu_domain(domain)   \
+   container_of(domain, struct viommu_domain, domain)
+
+static int viommu_get_req_errno(void *buf, size_t len)
+{
+   struct virtio_iommu_req_tail *tail = buf + len - sizeof(*tail);
+
+   switch (tail->status) {
+   case VIRTIO_IOMMU_S_OK:
+

[PATCH v2 1/5] dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu

2018-06-21 Thread Jean-Philippe Brucker
A virtio-mmio node may represent a virtio-iommu device. This is discovered
by the virtio driver at probe time, but the DMA topology isn't
discoverable and must be described by firmware. For DT the standard IOMMU
description is used, as specified in bindings/iommu/iommu.txt and
bindings/pci/pci-iommu.txt. Like many other IOMMUs, virtio-iommu
distinguishes masters by their endpoint IDs, which requires one IOMMU cell
in the "iommus" property.

Signed-off-by: Jean-Philippe Brucker 
---
 Documentation/devicetree/bindings/virtio/mmio.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
b/Documentation/devicetree/bindings/virtio/mmio.txt
index 5069c1b8e193..337da0e3a87f 100644
--- a/Documentation/devicetree/bindings/virtio/mmio.txt
+++ b/Documentation/devicetree/bindings/virtio/mmio.txt
@@ -8,6 +8,14 @@ Required properties:
 - reg: control registers base address and size including configuration 
space
 - interrupts:  interrupt generated by the device
 
+Required properties for virtio-iommu:
+
+- #iommu-cells:When the node describes a virtio-iommu device, it is
+   linked to DMA masters using the "iommus" property as
+   described in devicetree/bindings/iommu/iommu.txt. For
+   virtio-iommu #iommu-cells must be 1, each cell describing
+   a single endpoint ID.
+
 Example:
 
virtio_block@3000 {
-- 
2.17.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/5] Add virtio-iommu driver

2018-06-21 Thread Jean-Philippe Brucker
Implement the base virtio-iommu driver, following version 0.7 of the
specification [1].

Changes since last version [2]:
* Address comments, thanks again for the review.
* As suggested, add a DT binding description in patch 1.
* Depend on VIRTIO_MMIO=y to fix a build failure¹
* Switch to v0.7 of the spec, which changes resv_mem parameters and adds
  an MMIO flag. These are trivial but not backward compatible. Once
  device or driver is upstream, updates to the spec will rely on feature
  bits to stay compatible with this code.
* Implement the new tlb_sync interface, by splitting add_req() and
  sync_req(). I noticed a small improvement on netperf stream because
  the synchronous iommu_unmap() also benefits from this. Other
  experiments, such as using kmem_cache for requests instead of kmalloc,
  didn't show any improvement.

Driver is available on branch virtio-iommu/v0.7 [3], and the x86+ACPI
prototype is on branch virtio-iommu/devel. That x86 code hasn't changed,
it still requires the DMA ifdeffery and I lack the expertise to tidy it
up. The kvmtool example device has been cleaned up and is available on
branch virtio-iommu/v0.7 [4].

Feedback welcome!

Thanks,
Jean

[1] virtio-iommu specification v0.7
https://www.spinics.net/lists/linux-virtualization/msg34127.html
[2] virtio-iommu driver v1
https://www.spinics.net/lists/kvm/msg164322.html
[3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.7

http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/v0.7
[4] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.7 

  ---
¹ A word on the module story. Because of complex dependencies IOMMU
drivers cannot yet be .ko modules. Enabling it is outside the scope of
this series but I have a small prototype on branch virtio-iommu/
module-devel. It seems desirable since some distros currently ship the
transport code as module and are unlikely to change this on our account.
Note that this series works fine with arm64 defconfig, which selects
VIRTIO_MMIO=y.

I could use some help to clean this up. Currently my solution is to split
virtio-iommu into a module and a 3-lines built-in stub, which isn't
graceful but could have been worse.

Keeping the whole virtio-iommu driver as builtin would require accessing
any virtio utility through get_symbol. So far we only have seven of
those and could keep a list of pointer ops, but I find this solution
terrible. If virtio or virtio_mmio is a module, virtio-iommu also needs
to be one.

The minimal set of changes to make a module out of an OF-based IOMMU
driver seems to be:
* Export IOMMU symbols used by drivers
* Don't give up deferring probe in of_iommu
* Move IOMMU OF tables to .rodata
* Create a static virtio-iommu stub that declares the virtio-iommu OF
  table entry. The build system doesn't pick up IOMMU_OF_DECLARE when
  done from an object destined to be built as module :(
* Create a device_link between endpoint and IOMMU, to ensure that
  removing the IOMMU driver also unbinds the endpoint driver. Putting
  this in IOMMU core seems like a better approach since even when not
  built as module, unbinding an IOMMU device via sysfs will cause
  crashes.

With this, as long as virtio-mmio isn't loaded, the OF code defers probe
of endpoints that depend on virtio-iommu. Once virtio-mmio is loaded,
the probe is still deferred until virtio-iommu registers itself to the
virtio bus. Once virtio-iommu is loaded, probe of endpoints managed by
the IOMMU follows.

I'll investigate ACPI IORT when I find some time, though I don't expect
much complication and suggest we tackle one problem at a time. Since it
is a luxury that requires changes to the IOMMU core, module support is
left as a future improvement.
  ---

Jean-Philippe Brucker (5):
  dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu
  iommu: Add virtio-iommu driver
  iommu/virtio: Add probe request
  iommu/virtio: Add event queue
  vfio: Allow type-1 IOMMU instantiation for ARM

 .../devicetree/bindings/virtio/mmio.txt   |8 +
 MAINTAINERS   |6 +
 drivers/iommu/Kconfig |   11 +
 drivers/iommu/Makefile|1 +
 drivers/iommu/virtio-iommu.c  | 1164 +
 drivers/vfio/Kconfig  |2 +-
 include/uapi/linux/virtio_ids.h   |1 +
 include/uapi/linux/virtio_iommu.h |  172 +++
 8 files changed, 1364 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

-- 
2.17.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC 3/3] iommu/iova: Remove find_iova()

2018-06-21 Thread Dmitry Safonov via iommu
This function is potentially dangerous: nothing protects returned iova.
As there is no user in tree anymore, delete it.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/iova.c | 20 
 include/linux/iova.h |  7 ---
 2 files changed, 27 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 4c63d92afaf7..4a568e28a633 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -336,26 +336,6 @@ static void private_free_iova(struct iova_domain *iovad, 
struct iova *iova)
 }
 
 /**
- * find_iova - finds an iova for a given pfn
- * @iovad: - iova domain in question.
- * @pfn: - page frame number
- * This function finds and returns an iova belonging to the
- * given doamin which matches the given pfn.
- */
-struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn)
-{
-   unsigned long flags;
-   struct iova *iova;
-
-   /* Take the lock so that no other thread is manipulating the rbtree */
-   spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
-   iova = private_find_iova(iovad, pfn);
-   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
-   return iova;
-}
-EXPORT_SYMBOL_GPL(find_iova);
-
-/**
  * __free_iova - frees the given iova
  * @iovad: iova domain in question.
  * @iova: iova in question.
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 803472b77919..006911306a84 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -158,7 +158,6 @@ void init_iova_domain(struct iova_domain *iovad, unsigned 
long granule,
unsigned long start_pfn);
 int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor);
-struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
 void put_iova_domain(struct iova_domain *iovad);
 struct iova *iova_split_and_pop(struct iova_domain *iovad,
unsigned long pfn_lo, unsigned long pfn_hi);
@@ -243,12 +242,6 @@ static inline int init_iova_flush_queue(struct iova_domain 
*iovad,
return -ENODEV;
 }
 
-static inline struct iova *find_iova(struct iova_domain *iovad,
-unsigned long pfn)
-{
-   return NULL;
-}
-
 static inline void put_iova_domain(struct iova_domain *iovad)
 {
 }
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 2/3] iommu/iova: Make free_iova() atomic

2018-06-21 Thread Dmitry Safonov via iommu
find_iova() grabs rbtree's spinlock only for the search time.
Nothing guaranties that returned iova still exist for __free_iova().
Prevent potential use-after-free and double-free by holding the spinlock
all the time iova is being searched and freed.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/iova.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 4b38eb507670..4c63d92afaf7 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -382,11 +382,14 @@ EXPORT_SYMBOL_GPL(__free_iova);
 void
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
-   struct iova *iova = find_iova(iovad, pfn);
+   unsigned long flags;
+   struct iova *iova;
 
+   spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+   iova = private_find_iova(iovad, pfn);
if (iova)
-   __free_iova(iovad, iova);
-
+   private_free_iova(iovad, iova);
+   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
 }
 EXPORT_SYMBOL_GPL(free_iova);
 
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 1/3] iommu/iova: Find and split iova under rbtree's lock

2018-06-21 Thread Dmitry Safonov via iommu
find_iova() holds iova_rbtree_lock only during the traversing rbtree.
After the lock is released, returned iova may be freed (e.g., during
module's release).
Hold the spinlock during search and removal of iova from the rbtree,
eleminating possible use-after-free or/and double-free of iova.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/intel-iommu.c | 14 +++---
 drivers/iommu/iova.c| 19 ---
 include/linux/iova.h| 10 --
 3 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 14e4b3722428..494394ef0f92 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4530,19 +4530,11 @@ static int intel_iommu_memory_notifier(struct 
notifier_block *nb,
struct intel_iommu *iommu;
struct page *freelist;
 
-   iova = find_iova(&si_domain->iovad, start_vpfn);
+   iova = iova_split_and_pop(&si_domain->iovad, 
start_vpfn, last_vpfn);
if (iova == NULL) {
-   pr_debug("Failed get IOVA for PFN %lx\n",
-start_vpfn);
-   break;
-   }
-
-   iova = split_and_remove_iova(&si_domain->iovad, iova,
-start_vpfn, last_vpfn);
-   if (iova == NULL) {
-   pr_warn("Failed to split IOVA PFN [%lx-%lx]\n",
+   pr_warn("Failed to split & pop IOVA PFN 
[%lx-%lx]\n",
start_vpfn, last_vpfn);
-   return NOTIFY_BAD;
+   break;
}
 
freelist = domain_unmap(si_domain, iova->pfn_lo,
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 83fe2621effe..4b38eb507670 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -715,23 +715,27 @@ copy_reserved_iova(struct iova_domain *from, struct 
iova_domain *to)
 }
 EXPORT_SYMBOL_GPL(copy_reserved_iova);
 
-struct iova *
-split_and_remove_iova(struct iova_domain *iovad, struct iova *iova,
+struct iova *iova_split_and_pop(struct iova_domain *iovad,
  unsigned long pfn_lo, unsigned long pfn_hi)
 {
-   unsigned long flags;
struct iova *prev = NULL, *next = NULL;
+   unsigned long flags;
+   struct iova *iova;
 
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+   iova = private_find_iova(iovad, pfn_lo);
+   if (iova == NULL)
+   goto err_unlock;
+
if (iova->pfn_lo < pfn_lo) {
prev = alloc_and_init_iova(iova->pfn_lo, pfn_lo - 1);
if (prev == NULL)
-   goto error;
+   goto err_unlock;
}
if (iova->pfn_hi > pfn_hi) {
next = alloc_and_init_iova(pfn_hi + 1, iova->pfn_hi);
if (next == NULL)
-   goto error;
+   goto err_free;
}
 
__cached_rbnode_delete_update(iovad, iova);
@@ -749,10 +753,11 @@ split_and_remove_iova(struct iova_domain *iovad, struct 
iova *iova,
 
return iova;
 
-error:
-   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+err_free:
if (prev)
free_iova_mem(prev);
+err_unlock:
+   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
return NULL;
 }
 
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 928442dda565..803472b77919 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -160,8 +160,8 @@ int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor);
 struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
 void put_iova_domain(struct iova_domain *iovad);
-struct iova *split_and_remove_iova(struct iova_domain *iovad,
-   struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi);
+struct iova *iova_split_and_pop(struct iova_domain *iovad,
+   unsigned long pfn_lo, unsigned long pfn_hi);
 void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
 #else
 static inline int iova_cache_get(void)
@@ -253,10 +253,8 @@ static inline void put_iova_domain(struct iova_domain 
*iovad)
 {
 }
 
-static inline struct iova *split_and_remove_iova(struct iova_domain *iovad,
-struct iova *iova,
-unsigned long pfn_lo,
-unsigned long pfn_hi)
+static inline struct iova *iova_split_and_pop(struct iova_domain *iovad,
+   unsigned long

[RFC 0/3] iommu/iova: Unsafe locking in find_iova()

2018-06-21 Thread Dmitry Safonov via iommu
find_iova() looks to be using a bad locking practice: it locks the
returned iova only for the search time.
And looking in code, the element can be removed from the tree and freed
under rbtree lock. That happens during memory hot-unplug and cleanup on
module removal.
Here I cleanup users of the function and delete it.

Dmitry Safonov (3):
  iommu/iova: Find and split iova under rbtree's lock
  iommu/iova: Make free_iova() atomic
  iommu/iova: Remove find_iova()

 drivers/iommu/intel-iommu.c | 14 +++--
 drivers/iommu/iova.c| 48 +
 include/linux/iova.h| 17 
 3 files changed, 25 insertions(+), 54 deletions(-)

-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4 V3] Allocate pages for kdump without encryption when SME is enabled

2018-06-21 Thread lijiang
在 2018年06月21日 18:23, Baoquan He 写道:
> On 06/21/18 at 01:06pm, lijiang wrote:
>> 在 2018年06月21日 09:53, Baoquan He 写道:
>>> On 06/16/18 at 04:27pm, Lianbo Jiang wrote:
 When SME is enabled in the first kernel, we will allocate pages
 for kdump without encryption in order to be able to boot the
 second kernel in the same manner as kexec, which helps to keep
 the same code style.

 Signed-off-by: Lianbo Jiang 
 ---
  kernel/kexec_core.c | 12 
  1 file changed, 12 insertions(+)

 diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
 index 20fef1a..3c22a9b 100644
 --- a/kernel/kexec_core.c
 +++ b/kernel/kexec_core.c
 @@ -471,6 +471,16 @@ static struct page 
 *kimage_alloc_crash_control_pages(struct kimage *image,
}
}
  
 +  if (pages) {
 +  unsigned int count, i;
 +
 +  pages->mapping = NULL;
 +  set_page_private(pages, order);
 +  count = 1 << order;
 +  for (i = 0; i < count; i++)
 +  SetPageReserved(pages + i);
>>>
>>> I guess you might imitate the kexec case, however kexec get pages from
>>> buddy. Crash pages are reserved in memblock, these codes might make no 
>>> sense.
>>>
>> Thanks for your comments.
>> We have changed the attribute of pages, so the original attribute of pages 
>> will be
>> restored when they free.
> 
> Hmm, you can check what kimage_free() is doing, and where
> kimage->control_pages, dest_pages, unusable_pages is assigned. Do you
> know where these original attribute of pages comes from and they are
> used/needed in CRASH case, if you care about them?
> 
Originally, we want to have an opportunity to restore the previous attribute of 
pages, that
should be more better if the pages are remembered in 'image->control_pages'.
If we remove these codes, it is also harmless for kdump, but it will become 
strange, maybe
someone could ask where to restore the previous attribute of pages.

Thanks.
>>
 +  arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
 +  }
return pages;
  }
  
 @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage 
 *image,
result  = -ENOMEM;
goto out;
}
 +  arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
 @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage 
 *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
 +  arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
 -- 
 2.9.5


 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump

2018-06-21 Thread Tom Lendacky
On 6/21/2018 3:39 AM, Baoquan He wrote:
> On 06/21/18 at 01:42pm, lijiang wrote:
>> 在 2018年06月21日 00:42, Tom Lendacky 写道:
>>> On 6/16/2018 3:27 AM, Lianbo Jiang wrote:
 In kdump mode, it will copy the device table of IOMMU from the old
 device table, which is encrypted when SME is enabled in the first
 kernel. So we must remap it in encrypted manner in order to be
 automatically decrypted when we read.

 Signed-off-by: Lianbo Jiang 
 ---
 Some changes:
 1. add some comments
 2. clean compile warning.

  drivers/iommu/amd_iommu_init.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

 diff --git a/drivers/iommu/amd_iommu_init.c 
 b/drivers/iommu/amd_iommu_init.c
 index 904c575..a20af4c 100644
 --- a/drivers/iommu/amd_iommu_init.c
 +++ b/drivers/iommu/amd_iommu_init.c
 @@ -889,11 +889,24 @@ static bool copy_device_table(void)
}
  
old_devtb_phys = entry & PAGE_MASK;
 +
 +  /*
 +   *  When sme enable in the first kernel, old_devtb_phys includes the
 +   *  memory encryption mask(sme_me_mask), we must remove the memory
 +   *  encryption mask to obtain the true physical address in kdump mode.
 +   */
 +  if (mem_encrypt_active() && is_kdump_kernel())
 +  old_devtb_phys = __sme_clr(old_devtb_phys);
 +
>>>
>>> You can probably just use "if (is_kdump_kernel())" here, since memory
>>> encryption is either on in both the first and second kernel or off in
>>> both the first and second kernel.  At which point __sme_clr() will do
>>> the proper thing.
>>>
>>> Actually, this needs to be done no matter what.  When doing either the
>>> ioremap_encrypted() or the memremap(), the physical address should not
>>> include the encryption bit/mask.
>>>
>>> Thanks,
>>> Tom
>>>
>> Thanks for your comments. If we don't remove the memory encryption mask, it 
>> will
>> return false because the 'old_devtb_phys >= 0x1ULL' may become true.
> 
> Lianbo, you may not get what Tom suggested. Tom means no matter what it
> is, encrypted or not in 1st kernel, we need get pure physicall address,
> and using below code is always right for both cases.
> 
>   if (is_kdump_kernel())
>   old_devtb_phys = __sme_clr(old_devtb_phys);
> 
> And this is simpler. You even can add one line of code comment to say
> like "Physical address w/o encryption mask is needed here."

Even simpler, there's no need to even check for is_kdump_kernel().  The
__sme_clr() should always be done if the physical address is going to be
used for some form of io or memory remapping.

So you could just change the existing:

old_devtb_phys = entry & PAGE_MASK;

to:

old_devtb_phys = __sme_clr(entry) & PAGE_MASK;

Thanks,
Tom

>>
>> Lianbo
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
 trustworthy!\n");
return false;
}
 -  old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
 +  old_devtb = (mem_encrypt_active() && is_kdump_kernel())
 +  ? (__force void *)ioremap_encrypted(old_devtb_phys,
 +  dev_table_size)
 +  : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);> +
if (!old_devtb)
return false;
  

>>
>> ___
>> kexec mailing list
>> ke...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/7 v5] Support for fsl-mc bus and its devices in SMMU

2018-06-21 Thread Will Deacon
Hi Nipun,

On Thu, Jun 21, 2018 at 03:59:27AM +, Nipun Gupta wrote:
> Will this patch-set be taken for the next kernel release (and via which
> tree)?

I think you need Acks from Robin and Joerg in order for this to be queued.
Robin should be back at the beginning of next month, so there's still time
for 4.19.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-06-21 Thread Mathias Nyman

On 21.06.2018 03:53, Sudip Mukherjee wrote:

Hi Mathias, Andy,

On Thu, Jun 07, 2018 at 10:40:03AM +0300, Mathias Nyman wrote:

On 06.06.2018 19:45, Sudip Mukherjee wrote:

Hi Andy,

And we meet again. :)

On Wed, Jun 06, 2018 at 06:36:35PM +0300, Andy Shevchenko wrote:

On Wed, 2018-06-06 at 17:12 +0300, Mathias Nyman wrote:

On 04.06.2018 18:28, Sudip Mukherjee wrote:

On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote:





Odd and unlikely, but to me this looks like some issue in allocating
dma memory
from pool using dma_pool_zalloc()


Here's the story:
Sudip sees usb issues on a Intel Atom based board with 4.14.2 kernel.
All tracing points to dma_pool_zalloc() returning the same dma address
block on
consecutive calls.

In the failing case dma_pool_zalloc() is called 3 - 6us apart.

<...>-26362 [002]   1186.756739: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756745: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756748: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000

dma_pool_zalloc() is called from xhci_segment_alloc() in
drivers/usb/host/xhci-mem.c
see:
https://elixir.bootlin.com/linux/v4.14.2/source/drivers/usb/host/xhci-
mem.c#L52

prints above are custom traces added right after dma_pool_zalloc()


For better understanding it would be good to have dma_pool_free() calls
debugged as well.




Sudip has a full (394M unpacked) trace at:
https://drive.google.com/open?id=1h-3r-1lfjg8oblBGkzdRIq8z3ZNgGZx-






But then it gets stuck, for the whole ring2 dma_pool_zalloc() just returns the 
same dma address as the last segment for
ring1:0x2d92b000. Last part of trace snippet is just another ring being freed.


A gentle ping on this. Any idea on what the problem might be and any
possible fix?



I tried to reproduce it by quickly hacking xhci to allocate and free 50 
segments each time
we normally allocate one segment from dmapool.
I let it run for 3 days on a Atom based platform, but could not reproduce it.

xhci testhack can be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git dmapool-test
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=dmapool-test

Tested by just leaving the following running for a few days:

while true; do echo 0 > authorized; sleep 3; echo 1 > authorized; sleep 3; done;
For some usb device (for example: /sys/bus/usb/devices/1-8)

Then grep logs for "MATTU dmatest match! "

Can you share a bit more details on the platform you are using, and what types 
of test you are running.
Does my test above trigger the case? (show "MATTU dmatest match!")

-Mathias

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4 V3] Allocate pages for kdump without encryption when SME is enabled

2018-06-21 Thread Baoquan He
On 06/21/18 at 01:06pm, lijiang wrote:
> 在 2018年06月21日 09:53, Baoquan He 写道:
> > On 06/16/18 at 04:27pm, Lianbo Jiang wrote:
> >> When SME is enabled in the first kernel, we will allocate pages
> >> for kdump without encryption in order to be able to boot the
> >> second kernel in the same manner as kexec, which helps to keep
> >> the same code style.
> >>
> >> Signed-off-by: Lianbo Jiang 
> >> ---
> >>  kernel/kexec_core.c | 12 
> >>  1 file changed, 12 insertions(+)
> >>
> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> >> index 20fef1a..3c22a9b 100644
> >> --- a/kernel/kexec_core.c
> >> +++ b/kernel/kexec_core.c
> >> @@ -471,6 +471,16 @@ static struct page 
> >> *kimage_alloc_crash_control_pages(struct kimage *image,
> >>}
> >>}
> >>  
> >> +  if (pages) {
> >> +  unsigned int count, i;
> >> +
> >> +  pages->mapping = NULL;
> >> +  set_page_private(pages, order);
> >> +  count = 1 << order;
> >> +  for (i = 0; i < count; i++)
> >> +  SetPageReserved(pages + i);
> > 
> > I guess you might imitate the kexec case, however kexec get pages from
> > buddy. Crash pages are reserved in memblock, these codes might make no 
> > sense.
> > 
> Thanks for your comments.
> We have changed the attribute of pages, so the original attribute of pages 
> will be
> restored when they free.

Hmm, you can check what kimage_free() is doing, and where
kimage->control_pages, dest_pages, unusable_pages is assigned. Do you
know where these original attribute of pages comes from and they are
used/needed in CRASH case, if you care about them?

> 
> >> +  arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
> >> +  }
> >>return pages;
> >>  }
> >>  
> >> @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage 
> >> *image,
> >>result  = -ENOMEM;
> >>goto out;
> >>}
> >> +  arch_kexec_post_alloc_pages(page_address(page), 1, 0);
> >>ptr = kmap(page);
> >>ptr += maddr & ~PAGE_MASK;
> >>mchunk = min_t(size_t, mbytes,
> >> @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage 
> >> *image,
> >>result = copy_from_user(ptr, buf, uchunk);
> >>kexec_flush_icache_page(page);
> >>kunmap(page);
> >> +  arch_kexec_pre_free_pages(page_address(page), 1);
> >>if (result) {
> >>result = -EFAULT;
> >>goto out;
> >> -- 
> >> 2.9.5
> >>
> >>
> >> ___
> >> kexec mailing list
> >> ke...@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump

2018-06-21 Thread lijiang
在 2018年06月21日 16:39, Baoquan He 写道:
> On 06/21/18 at 01:42pm, lijiang wrote:
>> 在 2018年06月21日 00:42, Tom Lendacky 写道:
>>> On 6/16/2018 3:27 AM, Lianbo Jiang wrote:
 In kdump mode, it will copy the device table of IOMMU from the old
 device table, which is encrypted when SME is enabled in the first
 kernel. So we must remap it in encrypted manner in order to be
 automatically decrypted when we read.

 Signed-off-by: Lianbo Jiang 
 ---
 Some changes:
 1. add some comments
 2. clean compile warning.

  drivers/iommu/amd_iommu_init.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

 diff --git a/drivers/iommu/amd_iommu_init.c 
 b/drivers/iommu/amd_iommu_init.c
 index 904c575..a20af4c 100644
 --- a/drivers/iommu/amd_iommu_init.c
 +++ b/drivers/iommu/amd_iommu_init.c
 @@ -889,11 +889,24 @@ static bool copy_device_table(void)
}
  
old_devtb_phys = entry & PAGE_MASK;
 +
 +  /*
 +   *  When sme enable in the first kernel, old_devtb_phys includes the
 +   *  memory encryption mask(sme_me_mask), we must remove the memory
 +   *  encryption mask to obtain the true physical address in kdump mode.
 +   */
 +  if (mem_encrypt_active() && is_kdump_kernel())
 +  old_devtb_phys = __sme_clr(old_devtb_phys);
 +
>>>
>>> You can probably just use "if (is_kdump_kernel())" here, since memory
>>> encryption is either on in both the first and second kernel or off in
>>> both the first and second kernel.  At which point __sme_clr() will do
>>> the proper thing.
>>>
>>> Actually, this needs to be done no matter what.  When doing either the
>>> ioremap_encrypted() or the memremap(), the physical address should not
>>> include the encryption bit/mask.
>>>
>>> Thanks,
>>> Tom
>>>
>> Thanks for your comments. If we don't remove the memory encryption mask, it 
>> will
>> return false because the 'old_devtb_phys >= 0x1ULL' may become true.
> 
> Lianbo, you may not get what Tom suggested. Tom means no matter what it
> is, encrypted or not in 1st kernel, we need get pure physicall address,
> and using below code is always right for both cases.
> 
>   if (is_kdump_kernel())
>   old_devtb_phys = __sme_clr(old_devtb_phys);
> 
Thank you, Baoquan. I understood what Tom said. I just want to explain why 
removing
the memory encryption mask is necessary before the 'old_devtb_phys >= 
0x1ULL'.

Lianbo
> And this is simpler. You even can add one line of code comment to say
> like "Physical address w/o encryption mask is needed here."
>>
>> Lianbo
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
 trustworthy!\n");
return false;
}
 -  old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
 +  old_devtb = (mem_encrypt_active() && is_kdump_kernel())
 +  ? (__force void *)ioremap_encrypted(old_devtb_phys,
 +  dev_table_size)
 +  : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);> +
if (!old_devtb)
return false;
  

>>
>> ___
>> kexec mailing list
>> ke...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4 V3] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-06-21 Thread lijiang
在 2018年06月21日 00:00, Tom Lendacky 写道:
> On 6/16/2018 3:27 AM, Lianbo Jiang wrote:
>> It is convenient to remap the old memory encrypted to the second
>> kernel by calling ioremap_encrypted().
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>> Some changes:
>> 1. remove the sme_active() check in __ioremap_caller().
>> 2. put some logic into the early_memremap_pgprot_adjust() for
>> early memremap.
>>
>>  arch/x86/include/asm/io.h |  3 +++
>>  arch/x86/mm/ioremap.c | 28 
>>  2 files changed, 23 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index f6e5b93..989d60b 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t 
>> offset, unsigned long size);
>>  #define ioremap_cache ioremap_cache
>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>> size, unsigned long prot_val);
>>  #define ioremap_prot ioremap_prot
>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
>> +unsigned long size);
>> +#define ioremap_encrypted ioremap_encrypted
>>  
>>  /**
>>   * ioremap -   map bus memory into CPU space
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index c63a545..e365fc4 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -24,6 +24,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "physaddr.h"
>>  
>> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
>> unsigned long size,
>>   * caller shouldn't need to know that small detail.
>>   */
>>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>> -unsigned long size, enum page_cache_mode pcm, void *caller)
>> +unsigned long size, enum page_cache_mode pcm,
>> +void *caller, bool encrypted)
>>  {
>>  unsigned long offset, vaddr;
>>  resource_size_t last_addr;
>> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
>> phys_addr,
>>   * resulting mapping.
>>   */
>>  prot = PAGE_KERNEL_IO;
>> -if (sev_active() && mem_flags.desc_other)
>> +if ((sev_active() && mem_flags.desc_other) || encrypted)
>>  prot = pgprot_encrypted(prot);
>>  
>>  switch (pcm) {
>> @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
>> unsigned long size)
>>  enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
>>  
>>  return __ioremap_caller(phys_addr, size, pcm,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_nocache);
>>  
>> @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
>> unsigned long size)
>>  enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
>>  
>>  return __ioremap_caller(phys_addr, size, pcm,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL_GPL(ioremap_uc);
>>  
>> @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
>>  void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_wc);
>>  
>> @@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
>>  void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_wt);
>>  
>> +void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long 
>> size)
>> +{
>> +return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
>> +__builtin_return_address(0), true);
>> +}
>> +EXPORT_SYMBOL(ioremap_encrypted);
>> +
>>  void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_cache);
>>  
>> @@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
>> unsigned long size,
>>  {
>>  return __ioremap_caller(phys_addr, size,
>>  pgprot2cachemode(__pgprot(prot_val)),
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_prot);
>>  
>> @@ -688,6 

Re: [PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump

2018-06-21 Thread Baoquan He
On 06/21/18 at 01:42pm, lijiang wrote:
> 在 2018年06月21日 00:42, Tom Lendacky 写道:
> > On 6/16/2018 3:27 AM, Lianbo Jiang wrote:
> >> In kdump mode, it will copy the device table of IOMMU from the old
> >> device table, which is encrypted when SME is enabled in the first
> >> kernel. So we must remap it in encrypted manner in order to be
> >> automatically decrypted when we read.
> >>
> >> Signed-off-by: Lianbo Jiang 
> >> ---
> >> Some changes:
> >> 1. add some comments
> >> 2. clean compile warning.
> >>
> >>  drivers/iommu/amd_iommu_init.c | 15 ++-
> >>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/iommu/amd_iommu_init.c 
> >> b/drivers/iommu/amd_iommu_init.c
> >> index 904c575..a20af4c 100644
> >> --- a/drivers/iommu/amd_iommu_init.c
> >> +++ b/drivers/iommu/amd_iommu_init.c
> >> @@ -889,11 +889,24 @@ static bool copy_device_table(void)
> >>}
> >>  
> >>old_devtb_phys = entry & PAGE_MASK;
> >> +
> >> +  /*
> >> +   *  When sme enable in the first kernel, old_devtb_phys includes the
> >> +   *  memory encryption mask(sme_me_mask), we must remove the memory
> >> +   *  encryption mask to obtain the true physical address in kdump mode.
> >> +   */
> >> +  if (mem_encrypt_active() && is_kdump_kernel())
> >> +  old_devtb_phys = __sme_clr(old_devtb_phys);
> >> +
> > 
> > You can probably just use "if (is_kdump_kernel())" here, since memory
> > encryption is either on in both the first and second kernel or off in
> > both the first and second kernel.  At which point __sme_clr() will do
> > the proper thing.
> > 
> > Actually, this needs to be done no matter what.  When doing either the
> > ioremap_encrypted() or the memremap(), the physical address should not
> > include the encryption bit/mask.
> > 
> > Thanks,
> > Tom
> > 
> Thanks for your comments. If we don't remove the memory encryption mask, it 
> will
> return false because the 'old_devtb_phys >= 0x1ULL' may become true.

Lianbo, you may not get what Tom suggested. Tom means no matter what it
is, encrypted or not in 1st kernel, we need get pure physicall address,
and using below code is always right for both cases.

if (is_kdump_kernel())
old_devtb_phys = __sme_clr(old_devtb_phys);

And this is simpler. You even can add one line of code comment to say
like "Physical address w/o encryption mask is needed here."
> 
> Lianbo
> >>if (old_devtb_phys >= 0x1ULL) {
> >>pr_err("The address of old device table is above 4G, not 
> >> trustworthy!\n");
> >>return false;
> >>}
> >> -  old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> >> +  old_devtb = (mem_encrypt_active() && is_kdump_kernel())
> >> +  ? (__force void *)ioremap_encrypted(old_devtb_phys,
> >> +  dev_table_size)
> >> +  : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);> +
> >>if (!old_devtb)
> >>return false;
> >>  
> >>
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/4 V3] Support kdump for AMD secure memory encryption(SME)

2018-06-21 Thread Baoquan He
On 06/21/18 at 11:18am, lijiang wrote:
> 在 2018年06月21日 09:21, Baoquan He 写道:
> > On 06/16/18 at 04:27pm, Lianbo Jiang wrote:
> >> It is convenient to remap the old memory encrypted to the second kernel by
> >> calling ioremap_encrypted().
> >>
> >> When sme enabled on AMD server, we also need to support kdump. Because
> >> the memory is encrypted in the first kernel, we will remap the old memory
> >> encrypted to the second kernel(crash kernel), and sme is also enabled in
> >> the second kernel, otherwise the old memory encrypted can not be decrypted.
> >> Because simply changing the value of a C-bit on a page will not
> >> automatically encrypt the existing contents of a page, and any data in the
> >> page prior to the C-bit modification will become unintelligible. A page of
> >> memory that is marked encrypted will be automatically decrypted when read
> >> from DRAM and will be automatically encrypted when written to DRAM.
> >>
> >> For the kdump, it is necessary to distinguish whether the memory is
> >> encrypted. Furthermore, we should also know which part of the memory is
> >> encrypted or decrypted. We will appropriately remap the memory according
> >> to the specific situation in order to tell cpu how to deal with the
> >> data(encrypted or decrypted). For example, when sme enabled, if the old
> >> memory is encrypted, we will remap the old memory in encrypted way, which
> >> will automatically decrypt the old memory encrypted when we read those data
> >> from the remapping address.
> >>
> >>  --
> >> | first-kernel | second-kernel | kdump support |
> >> |  (mem_encrypt=on|off)|   (yes|no)|
> >> |--+---+---|
> >> | on   | on| yes   |
> >> | off  | off   | yes   |
> >> | on   | off   | no|
> > 
> > 
> >> | off  | on| no|
> > 
> > It's not clear to me here. If 1st kernel sme is off, in 2nd kernel, when
> > you remap the old memory with non-sme mode, why did it fail?
> > 
> Thank you, Baoquan.
> For kdump, there are two cases that doesn't need to support:
> 
> 1. SME on(first kernel), but SME off(second kernel).
> Because the old memory is encrypted, we can't decrypt the old memory if SME 
> is off
> in the second kernel(in kdump mode).
> 
> 2. SME off(first kernel), but SME on(second kernel)
> Maybe this situation doesn't have significance in actual deployment, 
> furthermore, it
> will also increase the complexity of the code. It's just for testing, maybe 
> it is
> unnecessary to support it, because the old memory is unencrypted.

Hmm, sorry, I don't get why it is unnecessary because the old memory is
unencrypted. We developers should cover all cases unless a certain case
is no way to fix, or fixing it is not cost-effective. Otherwise there's
no execuse for us to not fix. When QE try their best to cover all test
cases, they don't consider its significance because even corner case
need be swept out, that is their job. There isn't a rule for developers
to decide if it's significant.

Or could you point out what complexity it will bring to defend your
point?

Thanks
Baoquan

> > And please run scripts/get_maintainer.pl and add maintainers of
> > component which is affected in patch to CC list.
> Great! I forgot CC maintainers, thanks for your reminder.
> 
> Lianbo
> > 
> >> |__|___|___|
> >>
> >> This patch is only for SME kdump, it is not support SEV kdump.
> >>
> >> Test tools:
> >> makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
> >> commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
> >> Author: Lianbo Jiang 
> >> Date:   Mon May 14 17:02:40 2018 +0800
> >> Note: This patch can only dump vmcore in the case of SME enabled.
> >>
> >> crash-7.2.1: https://github.com/crash-utility/crash.git
> >> commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
> >> Author: Dave Anderson 
> >> Date:   Fri May 11 15:54:32 2018 -0400
> >>
> >> Test environment:
> >> HP ProLiant DL385Gen10 AMD EPYC 7251
> >> 8-Core Processor
> >> 32768 MB memory
> >> 600 GB disk space
> >>
> >> Linux 4.17-rc7:
> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >> commit b04e217704b7 ("Linux 4.17-rc7")
> >> Author: Linus Torvalds 
> >> Date:   Sun May 27 13:01:47 2018 -0700
> >>
> >> Reference:
> >> AMD64 Architecture Programmer's Manual
> >> https://support.amd.com/TechDocs/24593.pdf
> >>
> >> Some changes:
> >> 1. remove the sme_active() check in __ioremap_caller().
> >> 2. remove the '#ifdef' stuff throughout this patch.
> >> 3. put some logic into the early_memremap_pgprot_adjust() and clean the
> >> previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
> >> arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
> >> 4. add a new file and modify Makefile.
> >> 5. clean compile warning in copy_device_table() and some co