Re: [PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump
On 06/21/18 at 08:12am, Tom Lendacky wrote: > On 6/21/2018 3:39 AM, Baoquan He wrote: > > On 06/21/18 at 01:42pm, lijiang wrote: > >> 在 2018年06月21日 00:42, Tom Lendacky 写道: > >>> On 6/16/2018 3:27 AM, Lianbo Jiang wrote: > In kdump mode, it will copy the device table of IOMMU from the old > device table, which is encrypted when SME is enabled in the first > kernel. So we must remap it in encrypted manner in order to be > automatically decrypted when we read. > > Signed-off-by: Lianbo Jiang > --- > Some changes: > 1. add some comments > 2. clean compile warning. > > drivers/iommu/amd_iommu_init.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/amd_iommu_init.c > b/drivers/iommu/amd_iommu_init.c > index 904c575..a20af4c 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -889,11 +889,24 @@ static bool copy_device_table(void) > } > > old_devtb_phys = entry & PAGE_MASK; > + > +/* > + * When sme enable in the first kernel, old_devtb_phys > includes the > + * memory encryption mask(sme_me_mask), we must remove the > memory > + * encryption mask to obtain the true physical address in > kdump mode. > + */ > +if (mem_encrypt_active() && is_kdump_kernel()) > +old_devtb_phys = __sme_clr(old_devtb_phys); > + > >>> > >>> You can probably just use "if (is_kdump_kernel())" here, since memory > >>> encryption is either on in both the first and second kernel or off in > >>> both the first and second kernel. At which point __sme_clr() will do > >>> the proper thing. > >>> > >>> Actually, this needs to be done no matter what. When doing either the > >>> ioremap_encrypted() or the memremap(), the physical address should not > >>> include the encryption bit/mask. > >>> > >>> Thanks, > >>> Tom > >>> > >> Thanks for your comments. If we don't remove the memory encryption mask, > >> it will > >> return false because the 'old_devtb_phys >= 0x1ULL' may become > >> true. > > > > Lianbo, you may not get what Tom suggested. Tom means no matter what it > > is, encrypted or not in 1st kernel, we need get pure physicall address, > > and using below code is always right for both cases. > > > > if (is_kdump_kernel()) > > old_devtb_phys = __sme_clr(old_devtb_phys); > > > > And this is simpler. You even can add one line of code comment to say > > like "Physical address w/o encryption mask is needed here." > > Even simpler, there's no need to even check for is_kdump_kernel(). The > __sme_clr() should always be done if the physical address is going to be > used for some form of io or memory remapping. > > So you could just change the existing: > > old_devtb_phys = entry & PAGE_MASK; > > to: > > old_devtb_phys = __sme_clr(entry) & PAGE_MASK; Agree, this is even better. > > >> > >> Lianbo > if (old_devtb_phys >= 0x1ULL) { > pr_err("The address of old device table is above 4G, > not trustworthy!\n"); > return false; > } > -old_devtb = memremap(old_devtb_phys, dev_table_size, > MEMREMAP_WB); > +old_devtb = (mem_encrypt_active() && is_kdump_kernel()) > +? (__force void *)ioremap_encrypted(old_devtb_phys, > +dev_table_size) > +: memremap(old_devtb_phys, dev_table_size, > MEMREMAP_WB);> + > if (!old_devtb) > return false; > > > >> > >> ___ > >> kexec mailing list > >> ke...@lists.infradead.org > >> http://lists.infradead.org/mailman/listinfo/kexec > > ___ > kexec mailing list > ke...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4 V3] Allocate pages for kdump without encryption when SME is enabled
On 06/21/18 at 09:27pm, lijiang wrote: > 在 2018年06月21日 18:23, Baoquan He 写道: > > On 06/21/18 at 01:06pm, lijiang wrote: > >> 在 2018年06月21日 09:53, Baoquan He 写道: > >>> On 06/16/18 at 04:27pm, Lianbo Jiang wrote: > When SME is enabled in the first kernel, we will allocate pages > for kdump without encryption in order to be able to boot the > second kernel in the same manner as kexec, which helps to keep > the same code style. > > Signed-off-by: Lianbo Jiang > --- > kernel/kexec_core.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index 20fef1a..3c22a9b 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -471,6 +471,16 @@ static struct page > *kimage_alloc_crash_control_pages(struct kimage *image, > } > } > > +if (pages) { > +unsigned int count, i; > + > +pages->mapping = NULL; > +set_page_private(pages, order); > +count = 1 << order; > +for (i = 0; i < count; i++) > +SetPageReserved(pages + i); > >>> > >>> I guess you might imitate the kexec case, however kexec get pages from > >>> buddy. Crash pages are reserved in memblock, these codes might make no > >>> sense. > >>> > >> Thanks for your comments. > >> We have changed the attribute of pages, so the original attribute of pages > >> will be > >> restored when they free. > > > > Hmm, you can check what kimage_free() is doing, and where > > kimage->control_pages, dest_pages, unusable_pages is assigned. Do you > > know where these original attribute of pages comes from and they are > > used/needed in CRASH case, if you care about them? > > > Originally, we want to have an opportunity to restore the previous attribute > of pages, that > should be more better if the pages are remembered in 'image->control_pages'. Again, please check who assigns value for 'image->control_pages'. > If we remove these codes, it is also harmless for kdump, but it will become > strange, maybe > someone could ask where to restore the previous attribute of pages. > > Thanks. > >> > +arch_kexec_post_alloc_pages(page_address(pages), 1 << > order, 0); > +} > return pages; > } > > @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage > *image, > result = -ENOMEM; > goto out; > } > +arch_kexec_post_alloc_pages(page_address(page), 1, 0); > ptr = kmap(page); > ptr += maddr & ~PAGE_MASK; > mchunk = min_t(size_t, mbytes, > @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage > *image, > result = copy_from_user(ptr, buf, uchunk); > kexec_flush_icache_page(page); > kunmap(page); > +arch_kexec_pre_free_pages(page_address(page), 1); > if (result) { > result = -EFAULT; > goto out; > -- > 2.9.5 > > > ___ > kexec mailing list > ke...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > > ___ > kexec mailing list > ke...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/5] iommu/virtio: Add probe request
On Thu, Jun 21, 2018 at 08:06:53PM +0100, Jean-Philippe Brucker wrote: > When the device offers the probe feature, send a probe request for each > device managed by the IOMMU. Extract RESV_MEM information. When we > encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region. > This will tell other subsystems that there is no need to map the MSI > doorbell in the virtio-iommu, because MSIs bypass it. > > Signed-off-by: Jean-Philippe Brucker > --- > drivers/iommu/virtio-iommu.c | 149 -- > include/uapi/linux/virtio_iommu.h | 37 > 2 files changed, 180 insertions(+), 6 deletions(-) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index ea0242d8624b..29ce9f4398ef 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -48,6 +48,7 @@ struct viommu_dev { > struct iommu_domain_geometrygeometry; > u64 pgsize_bitmap; > u8 domain_bits; > + u32 probe_size; > }; > > struct viommu_mapping { > @@ -69,8 +70,10 @@ struct viommu_domain { > }; > > struct viommu_endpoint { > + struct device *dev; > struct viommu_dev *viommu; > struct viommu_domain*vdomain; > + struct list_headresv_regions; > }; > > struct viommu_request { > @@ -121,6 +124,9 @@ static off_t viommu_get_req_offset(struct viommu_dev > *viommu, > { > size_t tail_size = sizeof(struct virtio_iommu_req_tail); > > + if (req->type == VIRTIO_IOMMU_T_PROBE) > + return len - viommu->probe_size - tail_size; > + > return len - tail_size; > } > > @@ -404,6 +410,103 @@ static int viommu_replay_mappings(struct viommu_domain > *vdomain) > return ret; > } > > +static int viommu_add_resv_mem(struct viommu_endpoint *vdev, > +struct virtio_iommu_probe_resv_mem *mem, > +size_t len) > +{ > + struct iommu_resv_region *region = NULL; > + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; > + > + u64 start = le64_to_cpu(mem->start); > + u64 end = le64_to_cpu(mem->end); > + size_t size = end - start + 1; > + > + if (len < sizeof(*mem)) > + return -EINVAL; > + > + switch (mem->subtype) { > + default: > + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED && > + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI) > + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n", > + mem->subtype); > + /* Fall-through */ > + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED: > + region = iommu_alloc_resv_region(start, size, 0, > + IOMMU_RESV_RESERVED); > + break; > + case VIRTIO_IOMMU_RESV_MEM_T_MSI: > + region = iommu_alloc_resv_region(start, size, prot, > + IOMMU_RESV_MSI); > + break; > + } > + > + list_add(&vdev->resv_regions, ®ion->list); > + return 0; > +} > + > +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device > *dev) > +{ > + int ret; > + u16 type, len; > + size_t cur = 0; > + size_t probe_len; > + struct virtio_iommu_req_probe *probe; > + struct virtio_iommu_probe_property *prop; > + struct iommu_fwspec *fwspec = dev->iommu_fwspec; > + struct viommu_endpoint *vdev = fwspec->iommu_priv; > + > + if (!fwspec->num_ids) > + return -EINVAL; > + > + probe_len = sizeof(*probe) + viommu->probe_size + > + sizeof(struct virtio_iommu_req_tail); > + probe = kzalloc(probe_len, GFP_KERNEL); > + if (!probe) > + return -ENOMEM; > + > + probe->head.type = VIRTIO_IOMMU_T_PROBE; > + /* > + * For now, assume that properties of an endpoint that outputs multiple > + * IDs are consistent. Only probe the first one. > + */ > + probe->endpoint = cpu_to_le32(fwspec->ids[0]); > + > + ret = viommu_send_req_sync(viommu, probe, probe_len); > + if (ret) > + goto out_free; > + > + prop = (void *)probe->properties; > + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK; > + > + while (type != VIRTIO_IOMMU_PROBE_T_NONE && > +cur < viommu->probe_size) { > + len = le16_to_cpu(prop->length); > + > + switch (type) { > + case VIRTIO_IOMMU_PROBE_T_RESV_MEM: > + ret = viommu_add_resv_mem(vdev, (void *)prop->value, > len); > + break; > + default: > + dev_dbg(dev, "unknown viommu prop 0x%x\n", type); > + } > + > + if (ret) > + dev_err(dev, "failed to parse viommu prop 0x%x\n", > type); > + > +
Re: [PATCH v2 2/5] iommu: Add virtio-iommu driver
On Thu, Jun 21, 2018 at 08:06:52PM +0100, Jean-Philippe Brucker wrote: > The virtio IOMMU is a para-virtualized device, allowing to send IOMMU > requests such as map/unmap over virtio-mmio transport without emulating > page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP > requests. > > The bulk of the code transforms calls coming from the IOMMU API into > corresponding virtio requests. Mappings are kept in an interval tree > instead of page tables. > > Signed-off-by: Jean-Philippe Brucker > --- > MAINTAINERS | 6 + > drivers/iommu/Kconfig | 11 + > drivers/iommu/Makefile| 1 + > drivers/iommu/virtio-iommu.c | 929 ++ > include/uapi/linux/virtio_ids.h | 1 + > include/uapi/linux/virtio_iommu.h | 117 > 6 files changed, 1065 insertions(+) > create mode 100644 drivers/iommu/virtio-iommu.c > create mode 100644 include/uapi/linux/virtio_iommu.h > > diff --git a/MAINTAINERS b/MAINTAINERS > index 55c08968aaab..bfb09dfa8e02 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -15248,6 +15248,12 @@ S: Maintained > F: drivers/virtio/virtio_input.c > F: include/uapi/linux/virtio_input.h > > +VIRTIO IOMMU DRIVER > +M: Jean-Philippe Brucker > +S: Maintained > +F: drivers/iommu/virtio-iommu.c > +F: include/uapi/linux/virtio_iommu.h > + > VIRTUAL BOX GUEST DEVICE DRIVER > M: Hans de Goede > M: Arnd Bergmann Please add the virtualization mailing list. > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 958417f22020..820709383899 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -412,4 +412,15 @@ config QCOM_IOMMU > help > Support for IOMMU on certain Qualcomm SoCs. > > +config VIRTIO_IOMMU > + bool "Virtio IOMMU driver" > + depends on VIRTIO_MMIO=y > + select IOMMU_API > + select INTERVAL_TREE > + select ARM_DMA_USE_IOMMU if ARM > + help > + Para-virtualised IOMMU driver with virtio. > + > + Say Y here if you intend to run this kernel as a guest. > + > endif # IOMMU_SUPPORT > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 244ad7913a81..b559c6ae81ea 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -33,3 +33,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o > obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o > obj-$(CONFIG_S390_IOMMU) += s390-iommu.o > obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o > +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > new file mode 100644 > index ..ea0242d8624b > --- /dev/null > +++ b/drivers/iommu/virtio-iommu.c > @@ -0,0 +1,929 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Virtio driver for the paravirtualized IOMMU > + * > + * Copyright (C) 2018 Arm Limited > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#define MSI_IOVA_BASE0x800 > +#define MSI_IOVA_LENGTH 0x10 > + > +#define VIOMMU_REQUEST_VQ0 > +#define VIOMMU_NR_VQS1 > + > +#define VIOMMU_REQUEST_TIMEOUT 1 /* 10s */ Where does this come from? Do you absolutely have to have an arbitrary timeout? > + > +struct viommu_dev { > + struct iommu_device iommu; > + struct device *dev; > + struct virtio_device*vdev; > + > + struct ida domain_ids; > + > + struct virtqueue*vqs[VIOMMU_NR_VQS]; > + spinlock_t request_lock; > + struct list_headrequests; > + > + /* Device configuration */ > + struct iommu_domain_geometrygeometry; > + u64 pgsize_bitmap; > + u8 domain_bits; > +}; > + > +struct viommu_mapping { > + phys_addr_t paddr; > + struct interval_tree_node iova; > + u32 flags; > +}; > + > +struct viommu_domain { > + struct iommu_domain domain; > + struct viommu_dev *viommu; > + struct mutexmutex; > + unsigned intid; > + > + spinlock_t mappings_lock; > + struct rb_root_cached mappings; > + > + unsigned long nr_endpoints; > +}; > + > +struct viommu_endpoint { > + struct viommu_dev *viommu; > + struct viommu_domain*vdomain; > +}; > + > +struct viommu_request { > + struct list_headlist; > + void*writeback; > + unsigned int
[PATCH v2 5/5] vfio: Allow type-1 IOMMU instantiation for ARM
ARM platforms may implement several kinds of IOMMUs (various SMMU or SMMUv3 implementations, virtio-iommu). They are all type-1, so automatically select VFIO_IOMMU_TYPE1 on ARM if IOMMU is selected. Signed-off-by: Jean-Philippe Brucker --- drivers/vfio/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index c84333eb5eb5..9de5ed38da83 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -21,7 +21,7 @@ config VFIO_VIRQFD menuconfig VFIO tristate "VFIO Non-Privileged userspace driver framework" depends on IOMMU_API - select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3) + select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64) select ANON_INODES help VFIO provides a framework for secure userspace device drivers. -- 2.17.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/5] iommu/virtio: Add event queue
The event queue offers a way for the device to report access faults from endpoints. It is implemented on virtqueue #1. Whenever the host needs to signal a fault, it fills one of the buffers offered by the guest and interrupts it. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/virtio-iommu.c | 116 +++--- include/uapi/linux/virtio_iommu.h | 18 + 2 files changed, 125 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 29ce9f4398ef..c075404e97d6 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -29,7 +29,8 @@ #define MSI_IOVA_LENGTH0x10 #define VIOMMU_REQUEST_VQ 0 -#define VIOMMU_NR_VQS 1 +#define VIOMMU_EVENT_VQ1 +#define VIOMMU_NR_VQS 2 #define VIOMMU_REQUEST_TIMEOUT 1 /* 10s */ @@ -43,6 +44,7 @@ struct viommu_dev { struct virtqueue*vqs[VIOMMU_NR_VQS]; spinlock_t request_lock; struct list_headrequests; + void*evts; /* Device configuration */ struct iommu_domain_geometrygeometry; @@ -84,6 +86,15 @@ struct viommu_request { charbuf[]; }; +#define VIOMMU_FAULT_RESV_MASK 0xff00 + +struct viommu_event { + union { + u32 head; + struct virtio_iommu_fault fault; + }; +}; + #define to_viommu_domain(domain) \ container_of(domain, struct viommu_domain, domain) @@ -507,6 +518,69 @@ static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) return ret; } +static int viommu_fault_handler(struct viommu_dev *viommu, + struct virtio_iommu_fault *fault) +{ + char *reason_str; + + u8 reason = fault->reason; + u32 flags = le32_to_cpu(fault->flags); + u32 endpoint= le32_to_cpu(fault->endpoint); + u64 address = le64_to_cpu(fault->address); + + switch (reason) { + case VIRTIO_IOMMU_FAULT_R_DOMAIN: + reason_str = "domain"; + break; + case VIRTIO_IOMMU_FAULT_R_MAPPING: + reason_str = "page"; + break; + case VIRTIO_IOMMU_FAULT_R_UNKNOWN: + default: + reason_str = "unknown"; + break; + } + + /* TODO: find EP by ID and report_iommu_fault */ + if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS) + dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx [%s%s%s]\n", + reason_str, endpoint, address, + flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : "", + flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : "", + flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : ""); + else + dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n", + reason_str, endpoint); + return 0; +} + +static void viommu_event_handler(struct virtqueue *vq) +{ + int ret; + unsigned int len; + struct scatterlist sg[1]; + struct viommu_event *evt; + struct viommu_dev *viommu = vq->vdev->priv; + + while ((evt = virtqueue_get_buf(vq, &len)) != NULL) { + if (len > sizeof(*evt)) { + dev_err(viommu->dev, + "invalid event buffer (len %u != %zu)\n", + len, sizeof(*evt)); + } else if (!(evt->head & VIOMMU_FAULT_RESV_MASK)) { + viommu_fault_handler(viommu, &evt->fault); + } + + sg_init_one(sg, evt, sizeof(*evt)); + ret = virtqueue_add_inbuf(vq, sg, 1, evt, GFP_ATOMIC); + if (ret) + dev_err(viommu->dev, "could not add event buffer\n"); + } + + if (!virtqueue_kick(vq)) + dev_err(viommu->dev, "kick failed\n"); +} + /* IOMMU API */ static struct iommu_domain *viommu_domain_alloc(unsigned type) @@ -893,16 +967,35 @@ static struct iommu_ops viommu_ops = { static int viommu_init_vqs(struct viommu_dev *viommu) { struct virtio_device *vdev = dev_to_virtio(viommu->dev); - const char *name = "request"; - void *ret; + const char *names[] = { "request", "event" }; + vq_callback_t *callbacks[] = { + NULL, /* No async requests */ + viommu_event_handler, + }; - ret = virtio_find_single_vq(vdev, NULL, name); - if (IS_ERR(ret)) { - dev_err(viommu->dev, "cannot find VQ\n"); - return PTR_ERR(ret); - } + return virtio_find_vqs(vdev, VIOMMU_NR_VQS, viommu->vqs, callbacks, +
[PATCH v2 3/5] iommu/virtio: Add probe request
When the device offers the probe feature, send a probe request for each device managed by the IOMMU. Extract RESV_MEM information. When we encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region. This will tell other subsystems that there is no need to map the MSI doorbell in the virtio-iommu, because MSIs bypass it. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/virtio-iommu.c | 149 -- include/uapi/linux/virtio_iommu.h | 37 2 files changed, 180 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index ea0242d8624b..29ce9f4398ef 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -48,6 +48,7 @@ struct viommu_dev { struct iommu_domain_geometrygeometry; u64 pgsize_bitmap; u8 domain_bits; + u32 probe_size; }; struct viommu_mapping { @@ -69,8 +70,10 @@ struct viommu_domain { }; struct viommu_endpoint { + struct device *dev; struct viommu_dev *viommu; struct viommu_domain*vdomain; + struct list_headresv_regions; }; struct viommu_request { @@ -121,6 +124,9 @@ static off_t viommu_get_req_offset(struct viommu_dev *viommu, { size_t tail_size = sizeof(struct virtio_iommu_req_tail); + if (req->type == VIRTIO_IOMMU_T_PROBE) + return len - viommu->probe_size - tail_size; + return len - tail_size; } @@ -404,6 +410,103 @@ static int viommu_replay_mappings(struct viommu_domain *vdomain) return ret; } +static int viommu_add_resv_mem(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_resv_mem *mem, + size_t len) +{ + struct iommu_resv_region *region = NULL; + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; + + u64 start = le64_to_cpu(mem->start); + u64 end = le64_to_cpu(mem->end); + size_t size = end - start + 1; + + if (len < sizeof(*mem)) + return -EINVAL; + + switch (mem->subtype) { + default: + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED && + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI) + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n", +mem->subtype); + /* Fall-through */ + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED: + region = iommu_alloc_resv_region(start, size, 0, +IOMMU_RESV_RESERVED); + break; + case VIRTIO_IOMMU_RESV_MEM_T_MSI: + region = iommu_alloc_resv_region(start, size, prot, +IOMMU_RESV_MSI); + break; + } + + list_add(&vdev->resv_regions, ®ion->list); + return 0; +} + +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) +{ + int ret; + u16 type, len; + size_t cur = 0; + size_t probe_len; + struct virtio_iommu_req_probe *probe; + struct virtio_iommu_probe_property *prop; + struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct viommu_endpoint *vdev = fwspec->iommu_priv; + + if (!fwspec->num_ids) + return -EINVAL; + + probe_len = sizeof(*probe) + viommu->probe_size + + sizeof(struct virtio_iommu_req_tail); + probe = kzalloc(probe_len, GFP_KERNEL); + if (!probe) + return -ENOMEM; + + probe->head.type = VIRTIO_IOMMU_T_PROBE; + /* +* For now, assume that properties of an endpoint that outputs multiple +* IDs are consistent. Only probe the first one. +*/ + probe->endpoint = cpu_to_le32(fwspec->ids[0]); + + ret = viommu_send_req_sync(viommu, probe, probe_len); + if (ret) + goto out_free; + + prop = (void *)probe->properties; + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK; + + while (type != VIRTIO_IOMMU_PROBE_T_NONE && + cur < viommu->probe_size) { + len = le16_to_cpu(prop->length); + + switch (type) { + case VIRTIO_IOMMU_PROBE_T_RESV_MEM: + ret = viommu_add_resv_mem(vdev, (void *)prop->value, len); + break; + default: + dev_dbg(dev, "unknown viommu prop 0x%x\n", type); + } + + if (ret) + dev_err(dev, "failed to parse viommu prop 0x%x\n", type); + + cur += sizeof(*prop) + len; + if (cur >= viommu->probe_size) + break; + + prop = (void *)probe->properties + cur; + type =
[PATCH v2 2/5] iommu: Add virtio-iommu driver
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU requests such as map/unmap over virtio-mmio transport without emulating page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP requests. The bulk of the code transforms calls coming from the IOMMU API into corresponding virtio requests. Mappings are kept in an interval tree instead of page tables. Signed-off-by: Jean-Philippe Brucker --- MAINTAINERS | 6 + drivers/iommu/Kconfig | 11 + drivers/iommu/Makefile| 1 + drivers/iommu/virtio-iommu.c | 929 ++ include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_iommu.h | 117 6 files changed, 1065 insertions(+) create mode 100644 drivers/iommu/virtio-iommu.c create mode 100644 include/uapi/linux/virtio_iommu.h diff --git a/MAINTAINERS b/MAINTAINERS index 55c08968aaab..bfb09dfa8e02 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15248,6 +15248,12 @@ S: Maintained F: drivers/virtio/virtio_input.c F: include/uapi/linux/virtio_input.h +VIRTIO IOMMU DRIVER +M: Jean-Philippe Brucker +S: Maintained +F: drivers/iommu/virtio-iommu.c +F: include/uapi/linux/virtio_iommu.h + VIRTUAL BOX GUEST DEVICE DRIVER M: Hans de Goede M: Arnd Bergmann diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 958417f22020..820709383899 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -412,4 +412,15 @@ config QCOM_IOMMU help Support for IOMMU on certain Qualcomm SoCs. +config VIRTIO_IOMMU + bool "Virtio IOMMU driver" + depends on VIRTIO_MMIO=y + select IOMMU_API + select INTERVAL_TREE + select ARM_DMA_USE_IOMMU if ARM + help + Para-virtualised IOMMU driver with virtio. + + Say Y here if you intend to run this kernel as a guest. + endif # IOMMU_SUPPORT diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 244ad7913a81..b559c6ae81ea 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -33,3 +33,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o obj-$(CONFIG_S390_IOMMU) += s390-iommu.o obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c new file mode 100644 index ..ea0242d8624b --- /dev/null +++ b/drivers/iommu/virtio-iommu.c @@ -0,0 +1,929 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Virtio driver for the paravirtualized IOMMU + * + * Copyright (C) 2018 Arm Limited + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define MSI_IOVA_BASE 0x800 +#define MSI_IOVA_LENGTH0x10 + +#define VIOMMU_REQUEST_VQ 0 +#define VIOMMU_NR_VQS 1 + +#define VIOMMU_REQUEST_TIMEOUT 1 /* 10s */ + +struct viommu_dev { + struct iommu_device iommu; + struct device *dev; + struct virtio_device*vdev; + + struct ida domain_ids; + + struct virtqueue*vqs[VIOMMU_NR_VQS]; + spinlock_t request_lock; + struct list_headrequests; + + /* Device configuration */ + struct iommu_domain_geometrygeometry; + u64 pgsize_bitmap; + u8 domain_bits; +}; + +struct viommu_mapping { + phys_addr_t paddr; + struct interval_tree_node iova; + u32 flags; +}; + +struct viommu_domain { + struct iommu_domain domain; + struct viommu_dev *viommu; + struct mutexmutex; + unsigned intid; + + spinlock_t mappings_lock; + struct rb_root_cached mappings; + + unsigned long nr_endpoints; +}; + +struct viommu_endpoint { + struct viommu_dev *viommu; + struct viommu_domain*vdomain; +}; + +struct viommu_request { + struct list_headlist; + void*writeback; + unsigned intwrite_offset; + unsigned intlen; + charbuf[]; +}; + +#define to_viommu_domain(domain) \ + container_of(domain, struct viommu_domain, domain) + +static int viommu_get_req_errno(void *buf, size_t len) +{ + struct virtio_iommu_req_tail *tail = buf + len - sizeof(*tail); + + switch (tail->status) { + case VIRTIO_IOMMU_S_OK: +
[PATCH v2 1/5] dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu
A virtio-mmio node may represent a virtio-iommu device. This is discovered by the virtio driver at probe time, but the DMA topology isn't discoverable and must be described by firmware. For DT the standard IOMMU description is used, as specified in bindings/iommu/iommu.txt and bindings/pci/pci-iommu.txt. Like many other IOMMUs, virtio-iommu distinguishes masters by their endpoint IDs, which requires one IOMMU cell in the "iommus" property. Signed-off-by: Jean-Philippe Brucker --- Documentation/devicetree/bindings/virtio/mmio.txt | 8 1 file changed, 8 insertions(+) diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt b/Documentation/devicetree/bindings/virtio/mmio.txt index 5069c1b8e193..337da0e3a87f 100644 --- a/Documentation/devicetree/bindings/virtio/mmio.txt +++ b/Documentation/devicetree/bindings/virtio/mmio.txt @@ -8,6 +8,14 @@ Required properties: - reg: control registers base address and size including configuration space - interrupts: interrupt generated by the device +Required properties for virtio-iommu: + +- #iommu-cells:When the node describes a virtio-iommu device, it is + linked to DMA masters using the "iommus" property as + described in devicetree/bindings/iommu/iommu.txt. For + virtio-iommu #iommu-cells must be 1, each cell describing + a single endpoint ID. + Example: virtio_block@3000 { -- 2.17.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/5] Add virtio-iommu driver
Implement the base virtio-iommu driver, following version 0.7 of the specification [1]. Changes since last version [2]: * Address comments, thanks again for the review. * As suggested, add a DT binding description in patch 1. * Depend on VIRTIO_MMIO=y to fix a build failure¹ * Switch to v0.7 of the spec, which changes resv_mem parameters and adds an MMIO flag. These are trivial but not backward compatible. Once device or driver is upstream, updates to the spec will rely on feature bits to stay compatible with this code. * Implement the new tlb_sync interface, by splitting add_req() and sync_req(). I noticed a small improvement on netperf stream because the synchronous iommu_unmap() also benefits from this. Other experiments, such as using kmem_cache for requests instead of kmalloc, didn't show any improvement. Driver is available on branch virtio-iommu/v0.7 [3], and the x86+ACPI prototype is on branch virtio-iommu/devel. That x86 code hasn't changed, it still requires the DMA ifdeffery and I lack the expertise to tidy it up. The kvmtool example device has been cleaned up and is available on branch virtio-iommu/v0.7 [4]. Feedback welcome! Thanks, Jean [1] virtio-iommu specification v0.7 https://www.spinics.net/lists/linux-virtualization/msg34127.html [2] virtio-iommu driver v1 https://www.spinics.net/lists/kvm/msg164322.html [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.7 http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/v0.7 [4] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.7 --- ¹ A word on the module story. Because of complex dependencies IOMMU drivers cannot yet be .ko modules. Enabling it is outside the scope of this series but I have a small prototype on branch virtio-iommu/ module-devel. It seems desirable since some distros currently ship the transport code as module and are unlikely to change this on our account. Note that this series works fine with arm64 defconfig, which selects VIRTIO_MMIO=y. I could use some help to clean this up. Currently my solution is to split virtio-iommu into a module and a 3-lines built-in stub, which isn't graceful but could have been worse. Keeping the whole virtio-iommu driver as builtin would require accessing any virtio utility through get_symbol. So far we only have seven of those and could keep a list of pointer ops, but I find this solution terrible. If virtio or virtio_mmio is a module, virtio-iommu also needs to be one. The minimal set of changes to make a module out of an OF-based IOMMU driver seems to be: * Export IOMMU symbols used by drivers * Don't give up deferring probe in of_iommu * Move IOMMU OF tables to .rodata * Create a static virtio-iommu stub that declares the virtio-iommu OF table entry. The build system doesn't pick up IOMMU_OF_DECLARE when done from an object destined to be built as module :( * Create a device_link between endpoint and IOMMU, to ensure that removing the IOMMU driver also unbinds the endpoint driver. Putting this in IOMMU core seems like a better approach since even when not built as module, unbinding an IOMMU device via sysfs will cause crashes. With this, as long as virtio-mmio isn't loaded, the OF code defers probe of endpoints that depend on virtio-iommu. Once virtio-mmio is loaded, the probe is still deferred until virtio-iommu registers itself to the virtio bus. Once virtio-iommu is loaded, probe of endpoints managed by the IOMMU follows. I'll investigate ACPI IORT when I find some time, though I don't expect much complication and suggest we tackle one problem at a time. Since it is a luxury that requires changes to the IOMMU core, module support is left as a future improvement. --- Jean-Philippe Brucker (5): dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu iommu: Add virtio-iommu driver iommu/virtio: Add probe request iommu/virtio: Add event queue vfio: Allow type-1 IOMMU instantiation for ARM .../devicetree/bindings/virtio/mmio.txt |8 + MAINTAINERS |6 + drivers/iommu/Kconfig | 11 + drivers/iommu/Makefile|1 + drivers/iommu/virtio-iommu.c | 1164 + drivers/vfio/Kconfig |2 +- include/uapi/linux/virtio_ids.h |1 + include/uapi/linux/virtio_iommu.h | 172 +++ 8 files changed, 1364 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/virtio-iommu.c create mode 100644 include/uapi/linux/virtio_iommu.h -- 2.17.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 3/3] iommu/iova: Remove find_iova()
This function is potentially dangerous: nothing protects returned iova. As there is no user in tree anymore, delete it. Cc: David Woodhouse Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Cc: Dmitry Safonov <0x7f454...@gmail.com> Signed-off-by: Dmitry Safonov --- drivers/iommu/iova.c | 20 include/linux/iova.h | 7 --- 2 files changed, 27 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 4c63d92afaf7..4a568e28a633 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -336,26 +336,6 @@ static void private_free_iova(struct iova_domain *iovad, struct iova *iova) } /** - * find_iova - finds an iova for a given pfn - * @iovad: - iova domain in question. - * @pfn: - page frame number - * This function finds and returns an iova belonging to the - * given doamin which matches the given pfn. - */ -struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn) -{ - unsigned long flags; - struct iova *iova; - - /* Take the lock so that no other thread is manipulating the rbtree */ - spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); - iova = private_find_iova(iovad, pfn); - spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); - return iova; -} -EXPORT_SYMBOL_GPL(find_iova); - -/** * __free_iova - frees the given iova * @iovad: iova domain in question. * @iova: iova in question. diff --git a/include/linux/iova.h b/include/linux/iova.h index 803472b77919..006911306a84 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -158,7 +158,6 @@ void init_iova_domain(struct iova_domain *iovad, unsigned long granule, unsigned long start_pfn); int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb, iova_entry_dtor entry_dtor); -struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn); void put_iova_domain(struct iova_domain *iovad); struct iova *iova_split_and_pop(struct iova_domain *iovad, unsigned long pfn_lo, unsigned long pfn_hi); @@ -243,12 +242,6 @@ static inline int init_iova_flush_queue(struct iova_domain *iovad, return -ENODEV; } -static inline struct iova *find_iova(struct iova_domain *iovad, -unsigned long pfn) -{ - return NULL; -} - static inline void put_iova_domain(struct iova_domain *iovad) { } -- 2.13.6 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 2/3] iommu/iova: Make free_iova() atomic
find_iova() grabs rbtree's spinlock only for the search time. Nothing guaranties that returned iova still exist for __free_iova(). Prevent potential use-after-free and double-free by holding the spinlock all the time iova is being searched and freed. Cc: David Woodhouse Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Cc: Dmitry Safonov <0x7f454...@gmail.com> Signed-off-by: Dmitry Safonov --- drivers/iommu/iova.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 4b38eb507670..4c63d92afaf7 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -382,11 +382,14 @@ EXPORT_SYMBOL_GPL(__free_iova); void free_iova(struct iova_domain *iovad, unsigned long pfn) { - struct iova *iova = find_iova(iovad, pfn); + unsigned long flags; + struct iova *iova; + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); + iova = private_find_iova(iovad, pfn); if (iova) - __free_iova(iovad, iova); - + private_free_iova(iovad, iova); + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); } EXPORT_SYMBOL_GPL(free_iova); -- 2.13.6 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 1/3] iommu/iova: Find and split iova under rbtree's lock
find_iova() holds iova_rbtree_lock only during the traversing rbtree. After the lock is released, returned iova may be freed (e.g., during module's release). Hold the spinlock during search and removal of iova from the rbtree, eleminating possible use-after-free or/and double-free of iova. Cc: David Woodhouse Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Cc: Dmitry Safonov <0x7f454...@gmail.com> Signed-off-by: Dmitry Safonov --- drivers/iommu/intel-iommu.c | 14 +++--- drivers/iommu/iova.c| 19 --- include/linux/iova.h| 10 -- 3 files changed, 19 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 14e4b3722428..494394ef0f92 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4530,19 +4530,11 @@ static int intel_iommu_memory_notifier(struct notifier_block *nb, struct intel_iommu *iommu; struct page *freelist; - iova = find_iova(&si_domain->iovad, start_vpfn); + iova = iova_split_and_pop(&si_domain->iovad, start_vpfn, last_vpfn); if (iova == NULL) { - pr_debug("Failed get IOVA for PFN %lx\n", -start_vpfn); - break; - } - - iova = split_and_remove_iova(&si_domain->iovad, iova, -start_vpfn, last_vpfn); - if (iova == NULL) { - pr_warn("Failed to split IOVA PFN [%lx-%lx]\n", + pr_warn("Failed to split & pop IOVA PFN [%lx-%lx]\n", start_vpfn, last_vpfn); - return NOTIFY_BAD; + break; } freelist = domain_unmap(si_domain, iova->pfn_lo, diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 83fe2621effe..4b38eb507670 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -715,23 +715,27 @@ copy_reserved_iova(struct iova_domain *from, struct iova_domain *to) } EXPORT_SYMBOL_GPL(copy_reserved_iova); -struct iova * -split_and_remove_iova(struct iova_domain *iovad, struct iova *iova, +struct iova *iova_split_and_pop(struct iova_domain *iovad, unsigned long pfn_lo, unsigned long pfn_hi) { - unsigned long flags; struct iova *prev = NULL, *next = NULL; + unsigned long flags; + struct iova *iova; spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); + iova = private_find_iova(iovad, pfn_lo); + if (iova == NULL) + goto err_unlock; + if (iova->pfn_lo < pfn_lo) { prev = alloc_and_init_iova(iova->pfn_lo, pfn_lo - 1); if (prev == NULL) - goto error; + goto err_unlock; } if (iova->pfn_hi > pfn_hi) { next = alloc_and_init_iova(pfn_hi + 1, iova->pfn_hi); if (next == NULL) - goto error; + goto err_free; } __cached_rbnode_delete_update(iovad, iova); @@ -749,10 +753,11 @@ split_and_remove_iova(struct iova_domain *iovad, struct iova *iova, return iova; -error: - spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); +err_free: if (prev) free_iova_mem(prev); +err_unlock: + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); return NULL; } diff --git a/include/linux/iova.h b/include/linux/iova.h index 928442dda565..803472b77919 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -160,8 +160,8 @@ int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb, iova_entry_dtor entry_dtor); struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn); void put_iova_domain(struct iova_domain *iovad); -struct iova *split_and_remove_iova(struct iova_domain *iovad, - struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi); +struct iova *iova_split_and_pop(struct iova_domain *iovad, + unsigned long pfn_lo, unsigned long pfn_hi); void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad); #else static inline int iova_cache_get(void) @@ -253,10 +253,8 @@ static inline void put_iova_domain(struct iova_domain *iovad) { } -static inline struct iova *split_and_remove_iova(struct iova_domain *iovad, -struct iova *iova, -unsigned long pfn_lo, -unsigned long pfn_hi) +static inline struct iova *iova_split_and_pop(struct iova_domain *iovad, + unsigned long
[RFC 0/3] iommu/iova: Unsafe locking in find_iova()
find_iova() looks to be using a bad locking practice: it locks the returned iova only for the search time. And looking in code, the element can be removed from the tree and freed under rbtree lock. That happens during memory hot-unplug and cleanup on module removal. Here I cleanup users of the function and delete it. Dmitry Safonov (3): iommu/iova: Find and split iova under rbtree's lock iommu/iova: Make free_iova() atomic iommu/iova: Remove find_iova() drivers/iommu/intel-iommu.c | 14 +++-- drivers/iommu/iova.c| 48 + include/linux/iova.h| 17 3 files changed, 25 insertions(+), 54 deletions(-) -- 2.13.6 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4 V3] Allocate pages for kdump without encryption when SME is enabled
在 2018年06月21日 18:23, Baoquan He 写道: > On 06/21/18 at 01:06pm, lijiang wrote: >> 在 2018年06月21日 09:53, Baoquan He 写道: >>> On 06/16/18 at 04:27pm, Lianbo Jiang wrote: When SME is enabled in the first kernel, we will allocate pages for kdump without encryption in order to be able to boot the second kernel in the same manner as kexec, which helps to keep the same code style. Signed-off-by: Lianbo Jiang --- kernel/kexec_core.c | 12 1 file changed, 12 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 20fef1a..3c22a9b 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image, } } + if (pages) { + unsigned int count, i; + + pages->mapping = NULL; + set_page_private(pages, order); + count = 1 << order; + for (i = 0; i < count; i++) + SetPageReserved(pages + i); >>> >>> I guess you might imitate the kexec case, however kexec get pages from >>> buddy. Crash pages are reserved in memblock, these codes might make no >>> sense. >>> >> Thanks for your comments. >> We have changed the attribute of pages, so the original attribute of pages >> will be >> restored when they free. > > Hmm, you can check what kimage_free() is doing, and where > kimage->control_pages, dest_pages, unusable_pages is assigned. Do you > know where these original attribute of pages comes from and they are > used/needed in CRASH case, if you care about them? > Originally, we want to have an opportunity to restore the previous attribute of pages, that should be more better if the pages are remembered in 'image->control_pages'. If we remove these codes, it is also harmless for kdump, but it will become strange, maybe someone could ask where to restore the previous attribute of pages. Thanks. >> + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0); + } return pages; } @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = -ENOMEM; goto out; } + arch_kexec_post_alloc_pages(page_address(page), 1, 0); ptr = kmap(page); ptr += maddr & ~PAGE_MASK; mchunk = min_t(size_t, mbytes, @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = copy_from_user(ptr, buf, uchunk); kexec_flush_icache_page(page); kunmap(page); + arch_kexec_pre_free_pages(page_address(page), 1); if (result) { result = -EFAULT; goto out; -- 2.9.5 ___ kexec mailing list ke...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump
On 6/21/2018 3:39 AM, Baoquan He wrote: > On 06/21/18 at 01:42pm, lijiang wrote: >> 在 2018年06月21日 00:42, Tom Lendacky 写道: >>> On 6/16/2018 3:27 AM, Lianbo Jiang wrote: In kdump mode, it will copy the device table of IOMMU from the old device table, which is encrypted when SME is enabled in the first kernel. So we must remap it in encrypted manner in order to be automatically decrypted when we read. Signed-off-by: Lianbo Jiang --- Some changes: 1. add some comments 2. clean compile warning. drivers/iommu/amd_iommu_init.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 904c575..a20af4c 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -889,11 +889,24 @@ static bool copy_device_table(void) } old_devtb_phys = entry & PAGE_MASK; + + /* + * When sme enable in the first kernel, old_devtb_phys includes the + * memory encryption mask(sme_me_mask), we must remove the memory + * encryption mask to obtain the true physical address in kdump mode. + */ + if (mem_encrypt_active() && is_kdump_kernel()) + old_devtb_phys = __sme_clr(old_devtb_phys); + >>> >>> You can probably just use "if (is_kdump_kernel())" here, since memory >>> encryption is either on in both the first and second kernel or off in >>> both the first and second kernel. At which point __sme_clr() will do >>> the proper thing. >>> >>> Actually, this needs to be done no matter what. When doing either the >>> ioremap_encrypted() or the memremap(), the physical address should not >>> include the encryption bit/mask. >>> >>> Thanks, >>> Tom >>> >> Thanks for your comments. If we don't remove the memory encryption mask, it >> will >> return false because the 'old_devtb_phys >= 0x1ULL' may become true. > > Lianbo, you may not get what Tom suggested. Tom means no matter what it > is, encrypted or not in 1st kernel, we need get pure physicall address, > and using below code is always right for both cases. > > if (is_kdump_kernel()) > old_devtb_phys = __sme_clr(old_devtb_phys); > > And this is simpler. You even can add one line of code comment to say > like "Physical address w/o encryption mask is needed here." Even simpler, there's no need to even check for is_kdump_kernel(). The __sme_clr() should always be done if the physical address is going to be used for some form of io or memory remapping. So you could just change the existing: old_devtb_phys = entry & PAGE_MASK; to: old_devtb_phys = __sme_clr(entry) & PAGE_MASK; Thanks, Tom >> >> Lianbo if (old_devtb_phys >= 0x1ULL) { pr_err("The address of old device table is above 4G, not trustworthy!\n"); return false; } - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + old_devtb = (mem_encrypt_active() && is_kdump_kernel()) + ? (__force void *)ioremap_encrypted(old_devtb_phys, + dev_table_size) + : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);> + if (!old_devtb) return false; >> >> ___ >> kexec mailing list >> ke...@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/7 v5] Support for fsl-mc bus and its devices in SMMU
Hi Nipun, On Thu, Jun 21, 2018 at 03:59:27AM +, Nipun Gupta wrote: > Will this patch-set be taken for the next kernel release (and via which > tree)? I think you need Acks from Robin and Joerg in order for this to be queued. Robin should be back at the beginning of next month, so there's still time for 4.19. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: usb HC busted?
On 21.06.2018 03:53, Sudip Mukherjee wrote: Hi Mathias, Andy, On Thu, Jun 07, 2018 at 10:40:03AM +0300, Mathias Nyman wrote: On 06.06.2018 19:45, Sudip Mukherjee wrote: Hi Andy, And we meet again. :) On Wed, Jun 06, 2018 at 06:36:35PM +0300, Andy Shevchenko wrote: On Wed, 2018-06-06 at 17:12 +0300, Mathias Nyman wrote: On 04.06.2018 18:28, Sudip Mukherjee wrote: On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote: Odd and unlikely, but to me this looks like some issue in allocating dma memory from pool using dma_pool_zalloc() Here's the story: Sudip sees usb issues on a Intel Atom based board with 4.14.2 kernel. All tracing points to dma_pool_zalloc() returning the same dma address block on consecutive calls. In the failing case dma_pool_zalloc() is called 3 - 6us apart. <...>-26362 [002] 1186.756739: xhci_ring_mem_detail: MATTU xhci_segment_alloc dma @ 0x2d92b000 <...>-26362 [002] 1186.756745: xhci_ring_mem_detail: MATTU xhci_segment_alloc dma @ 0x2d92b000 <...>-26362 [002] 1186.756748: xhci_ring_mem_detail: MATTU xhci_segment_alloc dma @ 0x2d92b000 dma_pool_zalloc() is called from xhci_segment_alloc() in drivers/usb/host/xhci-mem.c see: https://elixir.bootlin.com/linux/v4.14.2/source/drivers/usb/host/xhci- mem.c#L52 prints above are custom traces added right after dma_pool_zalloc() For better understanding it would be good to have dma_pool_free() calls debugged as well. Sudip has a full (394M unpacked) trace at: https://drive.google.com/open?id=1h-3r-1lfjg8oblBGkzdRIq8z3ZNgGZx- But then it gets stuck, for the whole ring2 dma_pool_zalloc() just returns the same dma address as the last segment for ring1:0x2d92b000. Last part of trace snippet is just another ring being freed. A gentle ping on this. Any idea on what the problem might be and any possible fix? I tried to reproduce it by quickly hacking xhci to allocate and free 50 segments each time we normally allocate one segment from dmapool. I let it run for 3 days on a Atom based platform, but could not reproduce it. xhci testhack can be found here: git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git dmapool-test https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=dmapool-test Tested by just leaving the following running for a few days: while true; do echo 0 > authorized; sleep 3; echo 1 > authorized; sleep 3; done; For some usb device (for example: /sys/bus/usb/devices/1-8) Then grep logs for "MATTU dmatest match! " Can you share a bit more details on the platform you are using, and what types of test you are running. Does my test above trigger the case? (show "MATTU dmatest match!") -Mathias ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4 V3] Allocate pages for kdump without encryption when SME is enabled
On 06/21/18 at 01:06pm, lijiang wrote: > 在 2018年06月21日 09:53, Baoquan He 写道: > > On 06/16/18 at 04:27pm, Lianbo Jiang wrote: > >> When SME is enabled in the first kernel, we will allocate pages > >> for kdump without encryption in order to be able to boot the > >> second kernel in the same manner as kexec, which helps to keep > >> the same code style. > >> > >> Signed-off-by: Lianbo Jiang > >> --- > >> kernel/kexec_core.c | 12 > >> 1 file changed, 12 insertions(+) > >> > >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > >> index 20fef1a..3c22a9b 100644 > >> --- a/kernel/kexec_core.c > >> +++ b/kernel/kexec_core.c > >> @@ -471,6 +471,16 @@ static struct page > >> *kimage_alloc_crash_control_pages(struct kimage *image, > >>} > >>} > >> > >> + if (pages) { > >> + unsigned int count, i; > >> + > >> + pages->mapping = NULL; > >> + set_page_private(pages, order); > >> + count = 1 << order; > >> + for (i = 0; i < count; i++) > >> + SetPageReserved(pages + i); > > > > I guess you might imitate the kexec case, however kexec get pages from > > buddy. Crash pages are reserved in memblock, these codes might make no > > sense. > > > Thanks for your comments. > We have changed the attribute of pages, so the original attribute of pages > will be > restored when they free. Hmm, you can check what kimage_free() is doing, and where kimage->control_pages, dest_pages, unusable_pages is assigned. Do you know where these original attribute of pages comes from and they are used/needed in CRASH case, if you care about them? > > >> + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0); > >> + } > >>return pages; > >> } > >> > >> @@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage > >> *image, > >>result = -ENOMEM; > >>goto out; > >>} > >> + arch_kexec_post_alloc_pages(page_address(page), 1, 0); > >>ptr = kmap(page); > >>ptr += maddr & ~PAGE_MASK; > >>mchunk = min_t(size_t, mbytes, > >> @@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage > >> *image, > >>result = copy_from_user(ptr, buf, uchunk); > >>kexec_flush_icache_page(page); > >>kunmap(page); > >> + arch_kexec_pre_free_pages(page_address(page), 1); > >>if (result) { > >>result = -EFAULT; > >>goto out; > >> -- > >> 2.9.5 > >> > >> > >> ___ > >> kexec mailing list > >> ke...@lists.infradead.org > >> http://lists.infradead.org/mailman/listinfo/kexec ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump
在 2018年06月21日 16:39, Baoquan He 写道: > On 06/21/18 at 01:42pm, lijiang wrote: >> 在 2018年06月21日 00:42, Tom Lendacky 写道: >>> On 6/16/2018 3:27 AM, Lianbo Jiang wrote: In kdump mode, it will copy the device table of IOMMU from the old device table, which is encrypted when SME is enabled in the first kernel. So we must remap it in encrypted manner in order to be automatically decrypted when we read. Signed-off-by: Lianbo Jiang --- Some changes: 1. add some comments 2. clean compile warning. drivers/iommu/amd_iommu_init.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 904c575..a20af4c 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -889,11 +889,24 @@ static bool copy_device_table(void) } old_devtb_phys = entry & PAGE_MASK; + + /* + * When sme enable in the first kernel, old_devtb_phys includes the + * memory encryption mask(sme_me_mask), we must remove the memory + * encryption mask to obtain the true physical address in kdump mode. + */ + if (mem_encrypt_active() && is_kdump_kernel()) + old_devtb_phys = __sme_clr(old_devtb_phys); + >>> >>> You can probably just use "if (is_kdump_kernel())" here, since memory >>> encryption is either on in both the first and second kernel or off in >>> both the first and second kernel. At which point __sme_clr() will do >>> the proper thing. >>> >>> Actually, this needs to be done no matter what. When doing either the >>> ioremap_encrypted() or the memremap(), the physical address should not >>> include the encryption bit/mask. >>> >>> Thanks, >>> Tom >>> >> Thanks for your comments. If we don't remove the memory encryption mask, it >> will >> return false because the 'old_devtb_phys >= 0x1ULL' may become true. > > Lianbo, you may not get what Tom suggested. Tom means no matter what it > is, encrypted or not in 1st kernel, we need get pure physicall address, > and using below code is always right for both cases. > > if (is_kdump_kernel()) > old_devtb_phys = __sme_clr(old_devtb_phys); > Thank you, Baoquan. I understood what Tom said. I just want to explain why removing the memory encryption mask is necessary before the 'old_devtb_phys >= 0x1ULL'. Lianbo > And this is simpler. You even can add one line of code comment to say > like "Physical address w/o encryption mask is needed here." >> >> Lianbo if (old_devtb_phys >= 0x1ULL) { pr_err("The address of old device table is above 4G, not trustworthy!\n"); return false; } - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + old_devtb = (mem_encrypt_active() && is_kdump_kernel()) + ? (__force void *)ioremap_encrypted(old_devtb_phys, + dev_table_size) + : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);> + if (!old_devtb) return false; >> >> ___ >> kexec mailing list >> ke...@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4 V3] Add a function(ioremap_encrypted) for kdump when AMD sme enabled
在 2018年06月21日 00:00, Tom Lendacky 写道: > On 6/16/2018 3:27 AM, Lianbo Jiang wrote: >> It is convenient to remap the old memory encrypted to the second >> kernel by calling ioremap_encrypted(). >> >> Signed-off-by: Lianbo Jiang >> --- >> Some changes: >> 1. remove the sme_active() check in __ioremap_caller(). >> 2. put some logic into the early_memremap_pgprot_adjust() for >> early memremap. >> >> arch/x86/include/asm/io.h | 3 +++ >> arch/x86/mm/ioremap.c | 28 >> 2 files changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h >> index f6e5b93..989d60b 100644 >> --- a/arch/x86/include/asm/io.h >> +++ b/arch/x86/include/asm/io.h >> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t >> offset, unsigned long size); >> #define ioremap_cache ioremap_cache >> extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long >> size, unsigned long prot_val); >> #define ioremap_prot ioremap_prot >> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, >> +unsigned long size); >> +#define ioremap_encrypted ioremap_encrypted >> >> /** >> * ioremap - map bus memory into CPU space >> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c >> index c63a545..e365fc4 100644 >> --- a/arch/x86/mm/ioremap.c >> +++ b/arch/x86/mm/ioremap.c >> @@ -24,6 +24,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "physaddr.h" >> >> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, >> unsigned long size, >> * caller shouldn't need to know that small detail. >> */ >> static void __iomem *__ioremap_caller(resource_size_t phys_addr, >> -unsigned long size, enum page_cache_mode pcm, void *caller) >> +unsigned long size, enum page_cache_mode pcm, >> +void *caller, bool encrypted) >> { >> unsigned long offset, vaddr; >> resource_size_t last_addr; >> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t >> phys_addr, >> * resulting mapping. >> */ >> prot = PAGE_KERNEL_IO; >> -if (sev_active() && mem_flags.desc_other) >> +if ((sev_active() && mem_flags.desc_other) || encrypted) >> prot = pgprot_encrypted(prot); >> >> switch (pcm) { >> @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, >> unsigned long size) >> enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS; >> >> return __ioremap_caller(phys_addr, size, pcm, >> -__builtin_return_address(0)); >> +__builtin_return_address(0), false); >> } >> EXPORT_SYMBOL(ioremap_nocache); >> >> @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, >> unsigned long size) >> enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC; >> >> return __ioremap_caller(phys_addr, size, pcm, >> -__builtin_return_address(0)); >> +__builtin_return_address(0), false); >> } >> EXPORT_SYMBOL_GPL(ioremap_uc); >> >> @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc); >> void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size) >> { >> return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC, >> -__builtin_return_address(0)); >> +__builtin_return_address(0), false); >> } >> EXPORT_SYMBOL(ioremap_wc); >> >> @@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc); >> void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size) >> { >> return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT, >> -__builtin_return_address(0)); >> +__builtin_return_address(0), false); >> } >> EXPORT_SYMBOL(ioremap_wt); >> >> +void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long >> size) >> +{ >> +return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB, >> +__builtin_return_address(0), true); >> +} >> +EXPORT_SYMBOL(ioremap_encrypted); >> + >> void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size) >> { >> return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB, >> -__builtin_return_address(0)); >> +__builtin_return_address(0), false); >> } >> EXPORT_SYMBOL(ioremap_cache); >> >> @@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, >> unsigned long size, >> { >> return __ioremap_caller(phys_addr, size, >> pgprot2cachemode(__pgprot(prot_val)), >> -__builtin_return_address(0)); >> +__builtin_return_address(0), false); >> } >> EXPORT_SYMBOL(ioremap_prot); >> >> @@ -688,6
Re: [PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump
On 06/21/18 at 01:42pm, lijiang wrote: > 在 2018年06月21日 00:42, Tom Lendacky 写道: > > On 6/16/2018 3:27 AM, Lianbo Jiang wrote: > >> In kdump mode, it will copy the device table of IOMMU from the old > >> device table, which is encrypted when SME is enabled in the first > >> kernel. So we must remap it in encrypted manner in order to be > >> automatically decrypted when we read. > >> > >> Signed-off-by: Lianbo Jiang > >> --- > >> Some changes: > >> 1. add some comments > >> 2. clean compile warning. > >> > >> drivers/iommu/amd_iommu_init.c | 15 ++- > >> 1 file changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/iommu/amd_iommu_init.c > >> b/drivers/iommu/amd_iommu_init.c > >> index 904c575..a20af4c 100644 > >> --- a/drivers/iommu/amd_iommu_init.c > >> +++ b/drivers/iommu/amd_iommu_init.c > >> @@ -889,11 +889,24 @@ static bool copy_device_table(void) > >>} > >> > >>old_devtb_phys = entry & PAGE_MASK; > >> + > >> + /* > >> + * When sme enable in the first kernel, old_devtb_phys includes the > >> + * memory encryption mask(sme_me_mask), we must remove the memory > >> + * encryption mask to obtain the true physical address in kdump mode. > >> + */ > >> + if (mem_encrypt_active() && is_kdump_kernel()) > >> + old_devtb_phys = __sme_clr(old_devtb_phys); > >> + > > > > You can probably just use "if (is_kdump_kernel())" here, since memory > > encryption is either on in both the first and second kernel or off in > > both the first and second kernel. At which point __sme_clr() will do > > the proper thing. > > > > Actually, this needs to be done no matter what. When doing either the > > ioremap_encrypted() or the memremap(), the physical address should not > > include the encryption bit/mask. > > > > Thanks, > > Tom > > > Thanks for your comments. If we don't remove the memory encryption mask, it > will > return false because the 'old_devtb_phys >= 0x1ULL' may become true. Lianbo, you may not get what Tom suggested. Tom means no matter what it is, encrypted or not in 1st kernel, we need get pure physicall address, and using below code is always right for both cases. if (is_kdump_kernel()) old_devtb_phys = __sme_clr(old_devtb_phys); And this is simpler. You even can add one line of code comment to say like "Physical address w/o encryption mask is needed here." > > Lianbo > >>if (old_devtb_phys >= 0x1ULL) { > >>pr_err("The address of old device table is above 4G, not > >> trustworthy!\n"); > >>return false; > >>} > >> - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); > >> + old_devtb = (mem_encrypt_active() && is_kdump_kernel()) > >> + ? (__force void *)ioremap_encrypted(old_devtb_phys, > >> + dev_table_size) > >> + : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);> + > >>if (!old_devtb) > >>return false; > >> > >> > > ___ > kexec mailing list > ke...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/4 V3] Support kdump for AMD secure memory encryption(SME)
On 06/21/18 at 11:18am, lijiang wrote: > 在 2018年06月21日 09:21, Baoquan He 写道: > > On 06/16/18 at 04:27pm, Lianbo Jiang wrote: > >> It is convenient to remap the old memory encrypted to the second kernel by > >> calling ioremap_encrypted(). > >> > >> When sme enabled on AMD server, we also need to support kdump. Because > >> the memory is encrypted in the first kernel, we will remap the old memory > >> encrypted to the second kernel(crash kernel), and sme is also enabled in > >> the second kernel, otherwise the old memory encrypted can not be decrypted. > >> Because simply changing the value of a C-bit on a page will not > >> automatically encrypt the existing contents of a page, and any data in the > >> page prior to the C-bit modification will become unintelligible. A page of > >> memory that is marked encrypted will be automatically decrypted when read > >> from DRAM and will be automatically encrypted when written to DRAM. > >> > >> For the kdump, it is necessary to distinguish whether the memory is > >> encrypted. Furthermore, we should also know which part of the memory is > >> encrypted or decrypted. We will appropriately remap the memory according > >> to the specific situation in order to tell cpu how to deal with the > >> data(encrypted or decrypted). For example, when sme enabled, if the old > >> memory is encrypted, we will remap the old memory in encrypted way, which > >> will automatically decrypt the old memory encrypted when we read those data > >> from the remapping address. > >> > >> -- > >> | first-kernel | second-kernel | kdump support | > >> | (mem_encrypt=on|off)| (yes|no)| > >> |--+---+---| > >> | on | on| yes | > >> | off | off | yes | > >> | on | off | no| > > > > > >> | off | on| no| > > > > It's not clear to me here. If 1st kernel sme is off, in 2nd kernel, when > > you remap the old memory with non-sme mode, why did it fail? > > > Thank you, Baoquan. > For kdump, there are two cases that doesn't need to support: > > 1. SME on(first kernel), but SME off(second kernel). > Because the old memory is encrypted, we can't decrypt the old memory if SME > is off > in the second kernel(in kdump mode). > > 2. SME off(first kernel), but SME on(second kernel) > Maybe this situation doesn't have significance in actual deployment, > furthermore, it > will also increase the complexity of the code. It's just for testing, maybe > it is > unnecessary to support it, because the old memory is unencrypted. Hmm, sorry, I don't get why it is unnecessary because the old memory is unencrypted. We developers should cover all cases unless a certain case is no way to fix, or fixing it is not cost-effective. Otherwise there's no execuse for us to not fix. When QE try their best to cover all test cases, they don't consider its significance because even corner case need be swept out, that is their job. There isn't a rule for developers to decide if it's significant. Or could you point out what complexity it will bring to defend your point? Thanks Baoquan > > And please run scripts/get_maintainer.pl and add maintainers of > > component which is affected in patch to CC list. > Great! I forgot CC maintainers, thanks for your reminder. > > Lianbo > > > >> |__|___|___| > >> > >> This patch is only for SME kdump, it is not support SEV kdump. > >> > >> Test tools: > >> makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile > >> commit e1de103eca8f (A draft for kdump vmcore about AMD SME) > >> Author: Lianbo Jiang > >> Date: Mon May 14 17:02:40 2018 +0800 > >> Note: This patch can only dump vmcore in the case of SME enabled. > >> > >> crash-7.2.1: https://github.com/crash-utility/crash.git > >> commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1) > >> Author: Dave Anderson > >> Date: Fri May 11 15:54:32 2018 -0400 > >> > >> Test environment: > >> HP ProLiant DL385Gen10 AMD EPYC 7251 > >> 8-Core Processor > >> 32768 MB memory > >> 600 GB disk space > >> > >> Linux 4.17-rc7: > >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > >> commit b04e217704b7 ("Linux 4.17-rc7") > >> Author: Linus Torvalds > >> Date: Sun May 27 13:01:47 2018 -0700 > >> > >> Reference: > >> AMD64 Architecture Programmer's Manual > >> https://support.amd.com/TechDocs/24593.pdf > >> > >> Some changes: > >> 1. remove the sme_active() check in __ioremap_caller(). > >> 2. remove the '#ifdef' stuff throughout this patch. > >> 3. put some logic into the early_memremap_pgprot_adjust() and clean the > >> previous unnecessary changes, for example: arch/x86/include/asm/dmi.h, > >> arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c. > >> 4. add a new file and modify Makefile. > >> 5. clean compile warning in copy_device_table() and some co