[PATCH 1/1] iommu/vt-d: Handle memory shortage on pasid table allocation
Pasid table memory allocation could return failure due to memory shortage. Limit the pasid table size to 1MiB because current 8MiB contiguous physical memory allocation can be hard to come by. W/o a PASID table, the device could continue to work with only shared virtual memory impacted. So, let's go ahead with context mapping even the memory allocation for pasid table failed. Fixes: cc580e41260d ("iommu/vt-d: Per PCI device pasid table interfaces") Cc: Ashok Raj Cc: Jacob Pan Cc: Mika Westerberg Reported-and-tested-by: Pelton Kyle D Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 6 +++--- drivers/iommu/intel-pasid.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 5f3f10cf9d9d..bedc801b06a0 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2540,9 +2540,9 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, if (dev && dev_is_pci(dev) && info->pasid_supported) { ret = intel_pasid_alloc_table(dev); if (ret) { - __dmar_remove_one_dev_info(info); - spin_unlock_irqrestore(&device_domain_lock, flags); - return NULL; + pr_warn("No pasid table for %s, pasid disabled\n", + dev_name(dev)); + info->pasid_supported = 0; } } spin_unlock_irqrestore(&device_domain_lock, flags); diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index 1c05ed6fc5a5..1fb5e12b029a 100644 --- a/drivers/iommu/intel-pasid.h +++ b/drivers/iommu/intel-pasid.h @@ -11,7 +11,7 @@ #define __INTEL_PASID_H #define PASID_MIN 0x1 -#define PASID_MAX 0x10 +#define PASID_MAX 0x2 struct pasid_entry { u64 val; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
Hi Jean, On 2018/8/31 21:34, Jean-Philippe Brucker wrote: On 27/08/18 09:06, Xu Zaibo wrote: +struct vfio_iommu_type1_bind_process { +__u32flags; +#define VFIO_IOMMU_BIND_PID(1 << 0) +__u32pasid; As I am doing some works on the SVA patch set. I just consider why the user space need this pasid. Maybe, is it much more reasonable to set the pasid into all devices under the vfio container by a call back function from 'vfio_devices' while 'VFIO_IOMMU_BIND_PROCESS' CMD is executed in kernel land? I am not sure because there exists no suitable call back in 'vfio_device' at present. When using vfio-pci, the kernel doesn't know how to program the PASID into the device because the only kernel driver for the device is the generic vfio-pci module. The PCI specification doesn't describe a way of setting up the PASID, it's vendor-specific. Only the userspace application owning the device (and calling VFIO_IOMMU_BIND) knows how to do it, so we return the allocated PASID. Note that unlike vfio-mdev where applications share slices of a function, with vfio-pci one application owns the whole function so it's safe to let userspace set the PASID in hardware. With vfio-mdev it's the kernel driver that should be in charge of setting the PASID as you described, and we wouldn't have a reason to return the PASID in the vfio_iommu_type1_bind_process structure. Understood. But I still get the following confusion: As one application takes a whole function while using VFIO-PCI, why do the application and the function need to enable PASID capability? (Since just one I/O page table is enough for them.) Maybe I misunderstood, hope you can help me clear it. Thank you very much. Thanks, Zaibo . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: your mail
Hi Christoph, On Mon, Aug 27, 2018 at 04:50:27PM +0200, Christoph Hellwig wrote: > Subject: [RFC] merge dma_direct_ops and dma_noncoherent_ops > > While most architectures are either always or never dma coherent for a > given build, the arm, arm64, mips and soon arc architectures can have > different dma coherent settings on a per-device basis. Additionally > some mips builds can decide at boot time if dma is coherent or not. > > I've started to look into handling noncoherent dma in swiotlb, and > moving the dma-iommu ops into common code [1], and for that we need a > generic way to check if a given device is coherent or not. Moving > this flag into struct device also simplifies the conditionally coherent > architecture implementations. > > These patches are also available in a git tree given that they have > a few previous posted dependencies: > > git://git.infradead.org/users/hch/misc.git dma-direct-noncoherent-merge > > Gitweb: > > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-noncoherent-merge Apart from the nits in patch 2, these look sane to me from a MIPS perspective, so for patches 1-4: Acked-by: Paul Burton # MIPS parts Thanks, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] dma-mapping: move the dma_coherent flag to struct device
Hi Christoph, On Mon, Aug 27, 2018 at 04:50:29PM +0200, Christoph Hellwig wrote: > Various architectures support both coherent and non-coherent dma on > a per-device basis. Move the dma_noncoherent flag from mips the > mips archdata field to struct device proper to prepare the > infrastructure for reuse on other architectures. "mips the mips" > diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h > index a0aa00cc909d..f99748e9c08e 100644 > --- a/include/linux/dma-noncoherent.h > +++ b/include/linux/dma-noncoherent.h > @@ -4,6 +4,24 @@ > > #include > > +#ifdef CONFIG_ARCH_HAS_DMA_COHERENCE_H > +#include > +#else > +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \ > +defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \ > +defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) #elif ? Thanks, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3] of/platform: initialise AMBA default DMA masks
This addresses a v4.19-rc1 regression in the PL111 DRM driver in drivers/gpu/pl111/* The driver uses the CMA KMS helpers and will thus at some point call down to dma_alloc_attrs() to allocate a chunk of contigous DMA memory for the framebuffer. It appears that in v4.18, it was OK that this (and other DMA mastering AMBA devices) left dev->coherent_dma_mask blank (zero). In v4.19-rc1 the WARN_ON_ONCE(dev && !dev->coherent_dma_mask) in dma_alloc_attrs() in include/linux/dma-mapping.h is triggered. The allocation later fails when get_coherent_dma_mask() is called from __dma_alloc() and __dma_alloc() returns NULL: drm-clcd-pl111 dev:20: coherent DMA mask is unset drm-clcd-pl111 dev:20: [drm:drm_fb_helper_fbdev_setup] *ERROR* Failed to set fbdev configuration It turns out that in commit 4d8bde883bfb ("OF: Don't set default coherent DMA mask") the OF core stops setting the default DMA mask on new devices, especially those lines of the patch: - if (!dev->coherent_dma_mask) - dev->coherent_dma_mask = DMA_BIT_MASK(32); Robin Murphy solved a similar problem in a5516219b102 ("of/platform: Initialise default DMA masks") by simply assigning dev.coherent_dma_mask and the dev.dma_mask to point to the same when creating devices from the device tree, and introducing the same code into the code path creating AMBA/PrimeCell devices solved my problem, graphics now come up. The code simply assumes that the device can access all of the system memory by setting the coherent DMA mask to 0x when creating a device from the device tree, which is crude, but seems to be what kernel v4.18 assumed. The AMBA PrimeCells do not differ between coherent and streaming DMA so we can just assign the same to any DMA mask. Possibly drivers should augment their coherent DMA mask in accordance with "dma-ranges" from the device tree if more finegranular masking is needed. Reported-by: Russell King Fixes: 4d8bde883bfb ("OF: Don't set default coherent DMA mask") Cc: Russell King Cc: Christoph Hellwig Cc: Robin Murphy Signed-off-by: Linus Walleij --- ChangeLog v2->v3: - Provide proper root cause analysis, point to the right offending commit with Fixes: - Make even more elaborate description of what is causing this. ChangeLog v1->v2: - Provide a better description for the change. --- drivers/of/platform.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 7ba90c290a42..6c59673933e9 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -241,6 +241,10 @@ static struct amba_device *of_amba_device_create(struct device_node *node, if (!dev) goto err_clear_flag; + /* AMBA devices only support a single DMA mask */ + dev->dev.coherent_dma_mask = DMA_BIT_MASK(32); + dev->dev.dma_mask = &dev->dev.coherent_dma_mask; + /* setup generic device info */ dev->dev.of_node = of_node_get(node); dev->dev.fwnode = &node->fwnode; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 09/13] iommu/smmuv3: Get prepared for nested stage support
Hi Jean-Philippe, On 08/31/2018 03:20 PM, Jean-Philippe Brucker wrote: > On 23/08/18 13:17, Eric Auger wrote: >> if (ste->s1_cfg) { >> -BUG_ON(ste_live); > > Scary! :) The current code assumes that it can make modifications to the > STE in any order and enable translation after a sync. So far I haven't > been able to find anything that violates this rule in the spec: "If > software modifies the structure while it is valid, it must not allow the > structure to enter an invalid intermediate state." So maybe it's fine, > though to be safe I would have started with disabling the STE > (http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=936e49f923e101c061269eadd5fa43fef819d2e9) Yep this works with my setup but I was waiting for such kind of comments to turn this prototype into something more "production level" ;-) Did you send anything upstream, related to this branch? Thanks Eric > > Thanks, > Jean > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/13] iommu: Introduce tlb_invalidate API
Hi Jean-Philippe, On 08/31/2018 03:17 PM, Jean-Philippe Brucker wrote: > On 23/08/18 13:17, Eric Auger wrote: >> +/** >> + * Translation cache invalidation information, contains generic IOMMU >> + * data which can be parsed based on model ID by model specific drivers. >> + * Since the invalidation of second level page tables are included in the >> + * unmap operation, this info is only applicable to the first level >> + * translation caches, i.e. DMA request with PASID. >> + * >> + * @granularity:requested invalidation granularity, type dependent >> + * @size: 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. >> + * @nr_pages: number of pages to invalidate >> + * @pasid: processor address space ID value per PCI spec. >> + * @addr: page address to be invalidated >> + * @flags IOMMU_INVALIDATE_ADDR_LEAF: leaf paging entries >> + * IOMMU_INVALIDATE_GLOBAL_PAGE: global pages >> + * >> + */ >> +struct iommu_tlb_invalidate_info { >> +struct iommu_tlb_invalidate_hdr hdr; >> +enum iommu_inv_granularity granularity; >> +__u32 flags; >> +#define IOMMU_INVALIDATE_ADDR_LEAF (1 << 0) >> +#define IOMMU_INVALIDATE_GLOBAL_PAGE(1 << 1) >> +__u8size; >> +__u64 nr_pages; >> +__u32 pasid; >> +__u64 addr; >> +}; >> #endif /* _UAPI_IOMMU_H */ > > Since the ioctl will be used to combine invalidations (invalidate both > ATC and TLB with a single call), we need an additional ASID field for > the SMMU - ATC is invalidated by PASID, TLB by ASID. I used to call it > "tag", but I'm leaning towards "arch_id" now > (http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=40fdef74816dd8d8d113100b9e0162fab4cec28d) I aknowledge I am not crystal clear about that. for a given iommu_domain don't you have a single asid. Can't you retrieve the asid from the iommu_domain/arm_smmu_domain/arm_smmu_s1_cfg/arm_smmu_ctx_desc.asid? Here again I am confused bout the dual iommu_domain/struct device parameters. I have another trouble while doing the QEMU integration. When the guests does an NH_ALL, this propagates an invalidation on the whole IPA range and we must discriminate that from regular NH_VA calls. How would you encode the NH_ALL with this API? Besides I discover you work on virtio-iommu/stage2 enablement ;-) Thanks Eric > > Thanks, > Jean > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/13] iommu: Introduce bind_guest_stage API
Hi Jean-Philippe, On 08/31/2018 03:11 PM, Jean-Philippe Brucker wrote: > Hi Eric, > > On 23/08/18 16:25, Auger Eric wrote: >>> +int iommu_bind_guest_stage(struct iommu_domain *domain, struct device *dev, >>> + struct iommu_guest_stage_config *cfg) > > About the name change from iommu_bind_pasid_table: is the intent to > reuse this API for SMMUv2, which supports nested but not PASID? Seems > like a good idea but "iommu_bind_table" may be better since "stage" is > only used by Arm. At the moment I don't target SMUv2 but just SMMUv3. My focus was on nested stage enablement without enabling the multi-CD feature (PASID), whish is not supported by the QEMU vSMMUv3. Afterwards I realized that basically we are pointing to a CD or PASID table and that's about the same. I don't have a strong opinion on the name, iommu_bind_guest_table or iommu_bind_pasid_table would be fine with me. Indeed "stage" is ARM vocable (level for Intel?) > >>> +/** >>> + * PASID table data used to bind guest PASID table to the host IOMMU. This >>> will >>> + * enable guest managed first level page tables. >>> + * @version: for future extensions and identification of the data format >>> + * @bytes: size of this structure >>> + * @base_ptr: PASID table pointer >>> + * @pasid_bits:number of bits supported in the guest PASID table, must >>> be less >>> + * or equal than the host supported PASID size. >>> + */ >>> +struct iommu_pasid_table_config { >>> + __u32 version; >>> +#define PASID_TABLE_CFG_VERSION_1 1 >>> + __u32 bytes; >>> + __u64 base_ptr; >>> + __u8 pasid_bits; >>> +}; >>> + >>> +/** >>> + * Stream Table Entry S1 info >>> + * @disabled: the smmu is disabled >>> + * @bypassed: stage 1 is bypassed >>> + * @aborted: aborted >>> + * @cdptr_dma: GPA of the Context Descriptor >>> + * @asid: address space identified associated to the CD >>> + */ >>> +struct iommu_smmu_s1_config { >>> + __u8 disabled; >>> + __u8 bypassed; >>> + __u8 aborted; >>> + __u64 cdptr_dma; >>> + __u16 asid; >> This asid field is a leftover. asid is part of the CD (owned by the >> guest) and cannot be conveyed through this API. What is relevant is to >> pass is the guest asid_bits (vSMMU IDR1 info), to be compared with the >> host one. So we end up having something similar to the base_ptr and >> pasid_bits of iommu_pasid_table_config. > > That part seems strange. Userspace wouldn't try arbitrary ASID sizes > until one fits, especially since ASID size isn't the only parameter: > there will be SSID, IOVA, IPA sizes, HTTU features and I guess any other > SMMU_IDRx bit that corresponds to a field in the CD or STE. Doesn't > vSMMU need to tailor its IDR registers to whatever is supported by the > host SMMU, in order to use nested translation? Yes I agree this is needed anyway. > > Before creating the virtual ID registers, userspace will likely want to > know more about the physical IOMMU, by querying the kernel. I wrote > something about that interface recently: > https://www.spinics.net/lists/iommu/msg28039.html Ah OK I missed that part of the discussion. About the sysfs API, we devised one in the past for reserved_regions. I thought it would be useful for QEMU to identify them but eventually Alex pushed to create a VFIO API instead to make things more self-contained. We would need to double check with him. > > And if you provide this interface, checking the parameters again in > BIND_GUEST_STAGE isn't very useful, it only tells you that userspace > read your values. Once the guest writes the CD, it could still use the > wrong ASID size or some other invalid config. That would be caught by > the SMMU walker and cause a C_BAD_CD error report. Yes actually if there is some mismatch we might get BAD_STE/BAD_CD events. This might be another way to handle things. I did not integrate the fault API yet. This exercise is need to understand how we can catch things at QEMU level. > >> Otherwise, the other fields (disable, bypassed, aborted) can be packed >> as masks in a _u8 flag. > > What's the difference between aborted and disabled? I'm also not sure we > should give such fine tweaks to userspace, since the STE code is already > pretty complicated and has to deal with several state transitions. > Existing behavior (1) (2) (5) probably needs to be preserved: > > (1) Initially no container is set, s1-translate s2-bypass with the > default domain (non-VFIO) > (2) A VFIO_TYPE1_NESTING_IOMMU container is set, s1-bypass s2-translate > (3) BIND_GUEST_STAGE, s1-translate s2-translate > (4) UNBIND_GUEST_STAGE, s1-bypass s2-translate > (5) Remove container, s1-translate s2-bypass with the default domain > > That said, when instantiating a vSMMU, QEMU may want to switch from (2) > to an intermediate "s1-abort s2-translate" state. At the moment with > virtio-iommu I only create the stage-2 mappings at (3). This ensures > that transactions abort until the guest configures the vIOMMU and it > keeps the host driver simple,
Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
Hi Zaibo, On 27/08/18 09:06, Xu Zaibo wrote: >> +struct vfio_iommu_type1_bind_process { >> + __u32 flags; >> +#define VFIO_IOMMU_BIND_PID (1 << 0) >> + __u32 pasid; > As I am doing some works on the SVA patch set. I just consider why the > user space need this pasid. > Maybe, is it much more reasonable to set the pasid into all devices > under the vfio container by > a call back function from 'vfio_devices' while > 'VFIO_IOMMU_BIND_PROCESS' CMD is executed > in kernel land? I am not sure because there exists no suitable call back > in 'vfio_device' at present. When using vfio-pci, the kernel doesn't know how to program the PASID into the device because the only kernel driver for the device is the generic vfio-pci module. The PCI specification doesn't describe a way of setting up the PASID, it's vendor-specific. Only the userspace application owning the device (and calling VFIO_IOMMU_BIND) knows how to do it, so we return the allocated PASID. Note that unlike vfio-mdev where applications share slices of a function, with vfio-pci one application owns the whole function so it's safe to let userspace set the PASID in hardware. With vfio-mdev it's the kernel driver that should be in charge of setting the PASID as you described, and we wouldn't have a reason to return the PASID in the vfio_iommu_type1_bind_process structure. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 09/13] iommu/smmuv3: Get prepared for nested stage support
On 23/08/18 13:17, Eric Auger wrote: > if (ste->s1_cfg) { > - BUG_ON(ste_live); Scary! :) The current code assumes that it can make modifications to the STE in any order and enable translation after a sync. So far I haven't been able to find anything that violates this rule in the spec: "If software modifies the structure while it is valid, it must not allow the structure to enter an invalid intermediate state." So maybe it's fine, though to be safe I would have started with disabling the STE (http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=936e49f923e101c061269eadd5fa43fef819d2e9) Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/13] iommu: Introduce tlb_invalidate API
On 23/08/18 13:17, Eric Auger wrote: > +/** > + * Translation cache invalidation information, contains generic IOMMU > + * data which can be parsed based on model ID by model specific drivers. > + * Since the invalidation of second level page tables are included in the > + * unmap operation, this info is only applicable to the first level > + * translation caches, i.e. DMA request with PASID. > + * > + * @granularity: requested invalidation granularity, type dependent > + * @size:2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. > + * @nr_pages:number of pages to invalidate > + * @pasid: processor address space ID value per PCI spec. > + * @addr:page address to be invalidated > + * @flagsIOMMU_INVALIDATE_ADDR_LEAF: leaf paging entries > + * IOMMU_INVALIDATE_GLOBAL_PAGE: global pages > + * > + */ > +struct iommu_tlb_invalidate_info { > + struct iommu_tlb_invalidate_hdr hdr; > + enum iommu_inv_granularity granularity; > + __u32 flags; > +#define IOMMU_INVALIDATE_ADDR_LEAF (1 << 0) > +#define IOMMU_INVALIDATE_GLOBAL_PAGE (1 << 1) > + __u8size; > + __u64 nr_pages; > + __u32 pasid; > + __u64 addr; > +}; > #endif /* _UAPI_IOMMU_H */ Since the ioctl will be used to combine invalidations (invalidate both ATC and TLB with a single call), we need an additional ASID field for the SMMU - ATC is invalidated by PASID, TLB by ASID. I used to call it "tag", but I'm leaning towards "arch_id" now (http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=40fdef74816dd8d8d113100b9e0162fab4cec28d) Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/13] iommu: Introduce bind_guest_stage API
Hi Eric, On 23/08/18 16:25, Auger Eric wrote: >> +int iommu_bind_guest_stage(struct iommu_domain *domain, struct device *dev, >> + struct iommu_guest_stage_config *cfg) About the name change from iommu_bind_pasid_table: is the intent to reuse this API for SMMUv2, which supports nested but not PASID? Seems like a good idea but "iommu_bind_table" may be better since "stage" is only used by Arm. >> +/** >> + * PASID table data used to bind guest PASID table to the host IOMMU. This >> will >> + * enable guest managed first level page tables. >> + * @version: for future extensions and identification of the data format >> + * @bytes: size of this structure >> + * @base_ptr: PASID table pointer >> + * @pasid_bits: number of bits supported in the guest PASID table, must >> be less >> + * or equal than the host supported PASID size. >> + */ >> +struct iommu_pasid_table_config { >> +__u32 version; >> +#define PASID_TABLE_CFG_VERSION_1 1 >> +__u32 bytes; >> +__u64 base_ptr; >> +__u8 pasid_bits; >> +}; >> + >> +/** >> + * Stream Table Entry S1 info >> + * @disabled: the smmu is disabled >> + * @bypassed: stage 1 is bypassed >> + * @aborted: aborted >> + * @cdptr_dma: GPA of the Context Descriptor >> + * @asid: address space identified associated to the CD >> + */ >> +struct iommu_smmu_s1_config { >> +__u8 disabled; >> +__u8 bypassed; >> +__u8 aborted; >> +__u64 cdptr_dma; >> +__u16 asid; > This asid field is a leftover. asid is part of the CD (owned by the > guest) and cannot be conveyed through this API. What is relevant is to > pass is the guest asid_bits (vSMMU IDR1 info), to be compared with the > host one. So we end up having something similar to the base_ptr and > pasid_bits of iommu_pasid_table_config. That part seems strange. Userspace wouldn't try arbitrary ASID sizes until one fits, especially since ASID size isn't the only parameter: there will be SSID, IOVA, IPA sizes, HTTU features and I guess any other SMMU_IDRx bit that corresponds to a field in the CD or STE. Doesn't vSMMU need to tailor its IDR registers to whatever is supported by the host SMMU, in order to use nested translation? Before creating the virtual ID registers, userspace will likely want to know more about the physical IOMMU, by querying the kernel. I wrote something about that interface recently: https://www.spinics.net/lists/iommu/msg28039.html And if you provide this interface, checking the parameters again in BIND_GUEST_STAGE isn't very useful, it only tells you that userspace read your values. Once the guest writes the CD, it could still use the wrong ASID size or some other invalid config. That would be caught by the SMMU walker and cause a C_BAD_CD error report. > Otherwise, the other fields (disable, bypassed, aborted) can be packed > as masks in a _u8 flag. What's the difference between aborted and disabled? I'm also not sure we should give such fine tweaks to userspace, since the STE code is already pretty complicated and has to deal with several state transitions. Existing behavior (1) (2) (5) probably needs to be preserved: (1) Initially no container is set, s1-translate s2-bypass with the default domain (non-VFIO) (2) A VFIO_TYPE1_NESTING_IOMMU container is set, s1-bypass s2-translate (3) BIND_GUEST_STAGE, s1-translate s2-translate (4) UNBIND_GUEST_STAGE, s1-bypass s2-translate (5) Remove container, s1-translate s2-bypass with the default domain That said, when instantiating a vSMMU, QEMU may want to switch from (2) to an intermediate "s1-abort s2-translate" state. At the moment with virtio-iommu I only create the stage-2 mappings at (3). This ensures that transactions abort until the guest configures the vIOMMU and it keeps the host driver simple, but it has the downside of pinning guest memory lazily (necessary with virtio-iommu since the guest falls back to the map/unmap interface, which doesn't pin all memory upfront, if it can't support nested). > Anyway this API still is at the stage of proof of concept. At least it > shows the similarities between that use case and the PASID one and > should encourage to discuss about the relevance to have a unified API to > pass the guest stage info. Other required fields in the BIND_GUEST_STAGE ioctl are at least s1dss, s1cdmax and s1fmt. It's not for sanity-check, they describe the guest configuration. The guest selects a PASID table format (1-level, 2-level 4k/64k) which is written into STE.s1fmt. It also selects the behavior of requests-without-pasid, which is written into STE.s1dss. s1cdmax corresponds to pasid_bits. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 0/7 v6] Support for fsl-mc bus and its devices in SMMU
Hi Joerg/Robin, Can you please let me know when these patches will be applied onto the tree. Is there anything else pending from my side. Thanks, Nipun > -Original Message- > From: Nipun Gupta > Sent: Monday, July 9, 2018 4:48 PM > To: robin.mur...@arm.com; will.dea...@arm.com; robh...@kernel.org; > r...@kernel.org; mark.rutl...@arm.com; catalin.mari...@arm.com; > gre...@linuxfoundation.org; Laurentiu Tudor ; > bhelg...@google.com; h...@lst.de > Cc: j...@8bytes.org; m.szyprow...@samsung.com; shawn...@kernel.org; > frowand.l...@gmail.com; iommu@lists.linux-foundation.org; linux- > ker...@vger.kernel.org; devicet...@vger.kernel.org; linux-arm- > ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org; linux- > p...@vger.kernel.org; Bharat Bhushan ; > stuyo...@gmail.com; Leo Li ; Nipun Gupta > > Subject: [PATCH 0/7 v6] Support for fsl-mc bus and its devices in SMMU > > This patchset defines IOMMU DT binding for fsl-mc bus and adds > support in SMMU for fsl-mc bus. > > The patch series is based on top of dma-mapping tree (for-next branch): > http://git.infradead.org/users/hch/dma-mapping.git > > These patches > - Define property 'iommu-map' for fsl-mc bus (patch 1) > - Integrates the fsl-mc bus with the SMMU using this > IOMMU binding (patch 2,3,4) > - Adds the dma configuration support for fsl-mc bus (patch 5, 6) > - Updates the fsl-mc device node with iommu/dma related changes (patch 7) > > Changes in v2: > - use iommu-map property for fsl-mc bus > - rebase over patchset https://patchwork.kernel.org/patch/10317337/ > and make corresponding changes for dma configuration of devices on > fsl-mc bus > > Changes in v3: > - move of_map_rid in drivers/of/address.c > > Changes in v4: > - move of_map_rid in drivers/of/base.c > > Changes in v5: > - break patch 5 in two separate patches (now patch 5/7 and patch 6/7) > - add changelog text in patch 3/7 and patch 5/7 > - typo fix > > Changes in v6: > - Updated fsl_mc_device_group() API to be more rational > - Added dma-coherent property in the LS2 smmu device node > - Minor fixes in the device-tree documentation > > Nipun Gupta (7): > Documentation: fsl-mc: add iommu-map device-tree binding for fsl-mc > bus > iommu/of: make of_pci_map_rid() available for other devices too > iommu/of: support iommu configuration for fsl-mc devices > iommu/arm-smmu: Add support for the fsl-mc bus > bus: fsl-mc: support dma configure for devices on fsl-mc bus > bus: fsl-mc: set coherent dma mask for devices on fsl-mc bus > arm64: dts: ls208xa: comply with the iommu map binding for fsl_mc > > .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 39 > arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 7 +- > drivers/bus/fsl-mc/fsl-mc-bus.c| 16 +++- > drivers/iommu/arm-smmu.c | 7 ++ > drivers/iommu/iommu.c | 13 +++ > drivers/iommu/of_iommu.c | 25 - > drivers/of/base.c | 102 > + > drivers/of/irq.c | 5 +- > drivers/pci/of.c | 101 > include/linux/fsl/mc.h | 8 ++ > include/linux/iommu.h | 2 + > include/linux/of.h | 11 +++ > include/linux/of_pci.h | 10 -- > 13 files changed, 224 insertions(+), 122 deletions(-) > > -- > 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/platform: initialise AMBA default DMA masks
On Fri, Aug 31, 2018 at 10:26:14AM +0200, Linus Walleij wrote: > This addresses a v4.19-rc1 regression in the PL111 DRM driver > in drivers/gpu/pl111/* > > The driver uses the CMA KMS helpers and will thus at some > point call down to dma_alloc_attrs() to allocate a chunk > of contigous DMA memory for the framebuffer. > > It appears that in v4.18, it was OK that this (and other > DMA mastering AMBA devices) left dev->coherent_dma_mask > blank (zero). > > In v4.19-rc1 the WARN_ON_ONCE(dev && !dev->coherent_dma_mask) > in dma_alloc_attrs() in include/linux/dma-mapping.h is > triggered. The allocation later fails when get_coherent_dma_mask() > is called from __dma_alloc() and __dma_alloc() returns > NULL: > > drm-clcd-pl111 dev:20: coherent DMA mask is unset > drm-clcd-pl111 dev:20: [drm:drm_fb_helper_fbdev_setup] *ERROR* > Failed to set fbdev configuration > > I have not been able to properly bisect down to the actual > committ causing it beacuse the kernel simply fails to build > at to many bisection points, pushing the bisect back to places > like the merge of entire subsystem trees, where things have > likely been resolved while merging them. > > I found that Robin Murphy solved a similar problem in > a5516219b102 ("of/platform: Initialise default DMA masks") > by simply assigning dev.coherent_dma_mask and the > dev.dma_mask to point to the same when creating devices > from the device tree, and introducing the same code into > the code path creating AMBA/PrimeCell devices solved my > problem, graphics now come up. > > The code simply assumes that the device can access all > of the system memory by setting the coherent DMA mask > to 0x when creating a device from the device > tree, which is crude, but seems to be what kernel v4.18 > assumed. > > Cc: Russell King > Cc: Christoph Hellwig > Cc: Robin Murphy > Signed-off-by: Linus Walleij > --- > Russell, Christoph: this is probably not the last patch. > But it has a more accurate description of the problem. > I still do not know what change to the kernel made > it start triggering this, whether in the DMA API or > in DRM :( Doing some research, it seems the warning was added in v4.16-rc1, and it is only a warning - it doesn't otherwise change the behaviour, so it's not the actual warning that's the problem. I can't see anything in the DMA API nor DRM which would cause this either, but there are changes in the DT code. However, there are changes to of_dma_configure() which will be called just before the driver's probe function is called - and I guess the culpret is 4d8bde883bfb ("OF: Don't set default coherent DMA mask"). So I guess that's the root cause of the problem, and indeed this change was made in 4.19-rc1. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v15 4/5] dt-bindings: arm-smmu: Add bindings for qcom,smmu-v2
Hi Rob, On 8/30/2018 6:13 AM, Rob Herring wrote: On Wed, Aug 29, 2018 at 6:23 AM Vivek Gautam wrote: On Wed, Aug 29, 2018 at 2:05 PM Vivek Gautam wrote: Hi Rob, On 8/29/2018 2:04 AM, Rob Herring wrote: On Mon, Aug 27, 2018 at 04:25:50PM +0530, Vivek Gautam wrote: Add bindings doc for Qcom's smmu-v2 implementation. Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla --- Changes since v14: - This is a new patch added in v15 after noticing the new checkpatch warning for separate dt-bindings doc. - This patch also addresses comments given by Rob and Robin to add a list of valid values of '' in "qcom,-smmu-v2" compatible string. .../devicetree/bindings/iommu/arm,smmu.txt | 47 ++ 1 file changed, 47 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 8a6ffce12af5..52198a539606 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -17,10 +17,24 @@ conditions. "arm,mmu-401" "arm,mmu-500" "cavium,smmu-v2" +"qcom,-smmu-v2", "qcom,smmu-v2" The v2 in the compatible string is kind of redundant unless the SoC has other SMMU types. sdm845 has smmu-v2, and smmu-500 [1]. depending on the particular implementation and/or the version of the architecture implemented. + A number of Qcom SoCs use qcom,smmu-v2 version of the IP. + "qcom,-smmu-v2" represents a soc specific compatible + string that should be present along with the "qcom,smmu-v2" + to facilitate SoC specific clocks/power connections and to + address specific bug fixes. + '' string in "qcom,-smmu-v2" should be one of the + following: + msm8996 - for msm8996 Qcom SoC. + sdm845 - for sdm845 Qcom Soc. Rather than all this prose, it would be simpler to just add 2 lines with the full compatibles rather than . The thing is not going to work when/if we move bindings to json-schema also. then we keep adding "qcom,msm8996-smmu-v2", "qcom,smmu-v2" "qcom,msm8998-smmu-v2", "qcom,smmu-v2" "qcom,sdm845-smmu-v2", "qcom,smmu-v2", and from [1] "qcom,sdm845-smmu-500", "arm,mmu-500", etc. for each SoCs? How about following diff on top of this patch? diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 52198a539606..5e6c04876533 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -17,23 +17,18 @@ conditions. "arm,mmu-401" "arm,mmu-500" "cavium,smmu-v2" -"qcom,-smmu-v2", "qcom,smmu-v2" +"qcom,smmu-v2" depending on the particular implementation and/or the version of the architecture implemented. - A number of Qcom SoCs use qcom,smmu-v2 version of the IP. - "qcom,-smmu-v2" represents a soc specific compatible - string that should be present along with the "qcom,smmu-v2" - to facilitate SoC specific clocks/power connections and to - address specific bug fixes. - '' string in "qcom,-smmu-v2" should be one of the - following: - msm8996 - for msm8996 Qcom SoC. - sdm845 - for sdm845 Qcom Soc. - - An example string would be - - "qcom,msm8996-smmu-v2", "qcom,smmu-v2". + Qcom SoCs using qcom,smmu-v2 must have soc specific + compatible string attached to "qcom,smmu-v2" to take care + of SoC specific clocks/power connections and to address + specific bug fixes. + Precisely, it should be one of the following: + "qcom,msm8996-smmu-v2", "qcom,smmu-v2", + "qcom,sdm845-smmu-v2", "qcom,smmu-v2". We don't need an explanation of why we need specific compatibles in each binding document (though maybe we need a better explanation somewhere). We just need to know what are valid values for compatibles and this includes any combinations. Generally, this is just a list of combinations. [snip] Fixed this in v16. Thanks. Best regards Vivek ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] of/platform: initialise AMBA default DMA masks
This addresses a v4.19-rc1 regression in the PL111 DRM driver in drivers/gpu/pl111/* The driver uses the CMA KMS helpers and will thus at some point call down to dma_alloc_attrs() to allocate a chunk of contigous DMA memory for the framebuffer. It appears that in v4.18, it was OK that this (and other DMA mastering AMBA devices) left dev->coherent_dma_mask blank (zero). In v4.19-rc1 the WARN_ON_ONCE(dev && !dev->coherent_dma_mask) in dma_alloc_attrs() in include/linux/dma-mapping.h is triggered. The allocation later fails when get_coherent_dma_mask() is called from __dma_alloc() and __dma_alloc() returns NULL: drm-clcd-pl111 dev:20: coherent DMA mask is unset drm-clcd-pl111 dev:20: [drm:drm_fb_helper_fbdev_setup] *ERROR* Failed to set fbdev configuration I have not been able to properly bisect down to the actual committ causing it beacuse the kernel simply fails to build at to many bisection points, pushing the bisect back to places like the merge of entire subsystem trees, where things have likely been resolved while merging them. I found that Robin Murphy solved a similar problem in a5516219b102 ("of/platform: Initialise default DMA masks") by simply assigning dev.coherent_dma_mask and the dev.dma_mask to point to the same when creating devices from the device tree, and introducing the same code into the code path creating AMBA/PrimeCell devices solved my problem, graphics now come up. The code simply assumes that the device can access all of the system memory by setting the coherent DMA mask to 0x when creating a device from the device tree, which is crude, but seems to be what kernel v4.18 assumed. Cc: Russell King Cc: Christoph Hellwig Cc: Robin Murphy Signed-off-by: Linus Walleij --- Russell, Christoph: this is probably not the last patch. But it has a more accurate description of the problem. I still do not know what change to the kernel made it start triggering this, whether in the DMA API or in DRM :( I am looking into Russell's suggestion to use the DMA ranges, and thusly patch all device trees with DMA mastering devices adding DMA ranges to them and parsing that in the device tree AMBA/PrimeCell population code. Possibly that is the right path forward. --- drivers/of/platform.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 7ba90c290a42..6c59673933e9 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -241,6 +241,10 @@ static struct amba_device *of_amba_device_create(struct device_node *node, if (!dev) goto err_clear_flag; + /* AMBA devices only support a single DMA mask */ + dev->dev.coherent_dma_mask = DMA_BIT_MASK(32); + dev->dev.dma_mask = &dev->dev.coherent_dma_mask; + /* setup generic device info */ dev->dev.of_node = of_node_get(node); dev->dev.fwnode = &node->fwnode; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/5 V6] x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust encryption mask
For kdump kernel, when SME is enabled, the acpi table and dmi table will need to be remapped without the memory encryption mask. So we have to strengthen the logic in early_memremap_pgprot_adjust(), which makes us have an opportunity to adjust the memory encryption mask. Signed-off-by: Lianbo Jiang --- arch/x86/mm/ioremap.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index e01e6c695add..f9d9a39955f3 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -689,8 +689,15 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr, encrypted_prot = true; if (sme_active()) { +/* + * In kdump kernel, the acpi table and dmi table will need + * to be remapped without the memory encryption mask. Here + * we have to strengthen the logic to adjust the memory + * encryption mask. + */ if (early_memremap_is_setup_data(phys_addr, size) || - memremap_is_efi_data(phys_addr, size)) + memremap_is_efi_data(phys_addr, size) || + is_kdump_kernel()) encrypted_prot = false; } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/5 V6] kexec: allocate unencrypted control pages for kdump in case SME is enabled
When SME is enabled in the first kernel, we will allocate unencrypted pages for kdump in order to be able to boot the kdump kernel like kexec. Signed-off-by: Lianbo Jiang --- kernel/kexec_core.c | 12 1 file changed, 12 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 23a83a4da38a..e7efcd1a977b 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image, } } + if (pages) { + /* +* For kdump, we need to ensure that these pages are +* unencrypted pages if SME is enabled. +* By the way, it is unnecessary to call the arch_ +* kexec_pre_free_pages(), which will make the code +* become more simple. +*/ + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0); + } return pages; } @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = -ENOMEM; goto out; } + arch_kexec_post_alloc_pages(page_address(page), 1, 0); ptr = kmap(page); ptr += maddr & ~PAGE_MASK; mchunk = min_t(size_t, mbytes, @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = copy_from_user(ptr, buf, uchunk); kexec_flush_icache_page(page); kunmap(page); + arch_kexec_pre_free_pages(page_address(page), 1); if (result) { result = -EFAULT; goto out; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/5 V6] kdump/vmcore: support encrypted old memory with SME enabled
In kdump kernel, we need to dump the old memory into vmcore file,if SME is enabled in the first kernel, we have to remap the old memory with the memory encryption mask, which will be automatically decrypted when we read from DRAM. For SME kdump, there are two cases that doesn't support: -- | first-kernel | second-kernel | kdump support | | (mem_encrypt=on|off)| (yes|no)| |--+---+---| | on | on| yes | | off | off | yes | | on | off | no| | off | on| no| |__|___|___| 1. SME is enabled in the first kernel, but SME is disabled in kdump kernel In this case, because the old memory is encrypted, we can't decrypt the old memory. 2. SME is disabled in the first kernel, but SME is enabled in kdump kernel On the one hand, the old memory is unencrypted, the old memory can be dumped as usual, we don't need to enable SME in kdump kernel; On the other hand, it will increase the complexity of the code, we will have to consider how to pass the SME flag from the first kernel to the kdump kernel, it is really too expensive to do this. This patches are only for SME kdump, the patches don't support SEV kdump. Signed-off-by: Lianbo Jiang --- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/crash_dump_encrypt.c | 53 fs/proc/vmcore.c | 21 +++ include/linux/crash_dump.h | 12 +++ 4 files changed, 81 insertions(+), 6 deletions(-) create mode 100644 arch/x86/kernel/crash_dump_encrypt.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 8824d01c0c35..dfbeae0e35ce 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -97,6 +97,7 @@ obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o obj-$(CONFIG_KEXEC_FILE) += kexec-bzimage64.o obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o +obj-$(CONFIG_AMD_MEM_ENCRYPT) += crash_dump_encrypt.o obj-y += kprobes/ obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_DOUBLEFAULT) += doublefault.o diff --git a/arch/x86/kernel/crash_dump_encrypt.c b/arch/x86/kernel/crash_dump_encrypt.c new file mode 100644 index ..e1b1a577f197 --- /dev/null +++ b/arch/x86/kernel/crash_dump_encrypt.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Memory preserving reboot related code. + * + * Created by: Lianbo Jiang (liji...@redhat.com) + * Copyright (C) RedHat Corporation, 2018. All rights reserved + */ + +#include +#include +#include +#include + +/** + * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted" + * @pfn: page frame number to be copied + * @buf: target memory address for the copy; this can be in kernel address + * space or user address space (see @userbuf) + * @csize: number of bytes to copy + * @offset: offset in bytes into the page (based on pfn) to begin the copy + * @userbuf: if set, @buf is in user address space, use copy_to_user(), + * otherwise @buf is in kernel address space, use memcpy(). + * + * Copy a page from "oldmem encrypted". For this page, there is no pte + * mapped in the current kernel. We stitch up a pte, similar to + * kmap_atomic. + */ + +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, + size_t csize, unsigned long offset, int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT, + PAGE_SIZE); + if (!vaddr) + return -ENOMEM; + + if (userbuf) { + if (copy_to_user((void __user *)buf, vaddr + offset, csize)) { + iounmap((void __iomem *)vaddr); + return -EFAULT; + } + } else + memcpy(buf, vaddr + offset, csize); + + set_iounmap_nonlazy(); + iounmap((void __iomem *)vaddr); + return csize; +} diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index cbde728f8ac6..3065c8bada6a 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -25,6 +25,9 @@ #include #include #include +#include +#include +#include #include "internal.h" /* List representing chunks of contiguous memory areas and their offsets in @@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn) /* Reads a page from the oldmem device from given offset. */ static ssize_t read_from_oldmem(char *buf, size_t count, - u64 *ppos, int userbuf) + u64 *ppos, int userbuf, + bool encrypted) { unsigned long pfn, offset; size_t nr_bytes; @@ -120,8
[PATCH 4/5 V6] iommu/amd_iommu: remap the device table of IOMMU with the memory encryption mask for kdump
In kdump kernel, it will copy the device table of IOMMU from the old device table, which is encrypted when SME is enabled in the first kernel. So we have to remap the old device table with the memory encryption mask. Signed-off-by: Lianbo Jiang --- drivers/iommu/amd_iommu_init.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 84b3e4445d46..3931c7de7c69 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -902,12 +902,22 @@ static bool copy_device_table(void) } } - old_devtb_phys = entry & PAGE_MASK; + /* +* When SME is enabled in the first kernel, the entry includes the +* memory encryption mask(sme_me_mask), we must remove the memory +* encryption mask to obtain the true physical address in kdump kernel. +*/ + old_devtb_phys = __sme_clr(entry) & PAGE_MASK; + if (old_devtb_phys >= 0x1ULL) { pr_err("The address of old device table is above 4G, not trustworthy!\n"); return false; } - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + old_devtb = (sme_active() && is_kdump_kernel()) + ? (__force void *)ioremap_encrypted(old_devtb_phys, + dev_table_size) + : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + if (!old_devtb) return false; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/5 V6] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memroy
When SME is enabled on AMD machine, the memory is encrypted in the first kernel. In this case, SME also needs to be enabled in kdump kernel, and we have to remap the old memory with the memory encryption mask. Signed-off-by: Lianbo Jiang --- arch/x86/include/asm/io.h | 3 +++ arch/x86/mm/ioremap.c | 25 + 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 6de64840dd22..f8795f9581c7 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); #define ioremap_cache ioremap_cache extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, unsigned long prot_val); #define ioremap_prot ioremap_prot +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, + unsigned long size); +#define ioremap_encrypted ioremap_encrypted /** * ioremap - map bus memory into CPU space diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index c63a545ec199..e01e6c695add 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "physaddr.h" @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size, * caller shouldn't need to know that small detail. */ static void __iomem *__ioremap_caller(resource_size_t phys_addr, - unsigned long size, enum page_cache_mode pcm, void *caller) + unsigned long size, enum page_cache_mode pcm, + void *caller, bool encrypted) { unsigned long offset, vaddr; resource_size_t last_addr; @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, * resulting mapping. */ prot = PAGE_KERNEL_IO; - if (sev_active() && mem_flags.desc_other) + if ((sev_active() && mem_flags.desc_other) || encrypted) prot = pgprot_encrypted(prot); switch (pcm) { @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS; return __ioremap_caller(phys_addr, size, pcm, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_nocache); @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size) enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC; return __ioremap_caller(phys_addr, size, pcm, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL_GPL(ioremap_uc); @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc); void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_wc); @@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc); void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_wt); +void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size) +{ + return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB, + __builtin_return_address(0), true); +} +EXPORT_SYMBOL(ioremap_encrypted); + void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_cache); @@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, unsigned long size, { return __ioremap_caller(phys_addr, size, pgprot2cachemode(__pgprot(prot_val)), - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_prot); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/5 V6] Support kdump for AMD secure memory encryption(SME)
When SME is enabled on AMD machine, we also need to support kdump. Because the memory is encrypted in the first kernel, we will remap the old memory to the kdump kernel for dumping data, and SME is also enabled in the kdump kernel, otherwise the old memory can not be decrypted. For the kdump, it is necessary to distinguish whether the memory is encrypted. Furthermore, we should also know which part of the memory is encrypted or decrypted. We will appropriately remap the memory according to the specific situation in order to tell cpu how to access the memory. As we know, a page of memory that is marked as encrypted, which will be automatically decrypted when read from DRAM, and will also be automatically encrypted when written to DRAM. If the old memory is encrypted, we have to remap the old memory with the memory encryption mask, which will automatically decrypt the old memory when we read those data. For kdump(SME), there are two cases that doesn't support: -- | first-kernel | second-kernel | kdump support | | (mem_encrypt=on|off)| (yes|no)| |--+---+---| | on | on| yes | | off | off | yes | | on | off | no| | off | on| no| |__|___|___| 1. SME is enabled in the first kernel, but SME is disabled in kdump kernel In this case, because the old memory is encrypted, we can't decrypt the old memory. 2. SME is disabled in the first kernel, but SME is enabled in kdump kernel It is unnecessary to support in this case, because the old memory is unencrypted, the old memory can be dumped as usual, we don't need to enable SME in kdump kernel. Another, If we must support the scenario, it will increase the complexity of the code, we will have to consider how to pass the SME flag from the first kernel to the kdump kernel, in order to let the kdump kernel know that whether the old memory is encrypted. There are two methods to pass the SME flag to the kdump kernel. The first method is to modify the assembly code, which includes some common code and the path is too long. The second method is to use kexec tool, which could require the SME flag to be exported in the first kernel by "proc" or "sysfs", kexec tools will read the SME flag from "proc" or "sysfs" when we use kexec tools to load image, subsequently the SME flag will be saved in boot_params, we can properly remap the old memory according to the previously saved SME flag. But it is too expensive to do this. This patches are only for SME kdump, the patches don't support SEV kdump. Test tools: makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile commit e1de103eca8f (A draft for kdump vmcore about AMD SME) Note: This patch can only dump vmcore in the case of SME enabled. crash-7.2.3: https://github.com/crash-utility/crash.git commit 001f77a05585 (Fix for Linux 4.19-rc1 and later kernels that contain kernel commit7290d58095712a89f845e1bca05334796dd49ed2) Test environment: HP ProLiant DL385Gen10 AMD EPYC 7251 8-Core Processor 32768 MB memory 600 GB disk space Linux 4.19-rc1: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git commit 5b394b2ddf0347bef56e50c69a58773c94343ff3 Reference: AMD64 Architecture Programmer's Manual https://support.amd.com/TechDocs/24593.pdf Changes since v5: 1. modified patches log. Some known issues: 1. about SME Upstream kernel will hang on HP machine(DL385Gen10 AMD EPYC 7251) when we execute the kexec command as follow: # kexec -l /boot/vmlinuz-4.19.0-rc1+ --initrd=/boot/initramfs-4.19.0-rc1+.img --command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on rd.lvm.lv=rhel_hp-dl385g10-03/root rd.lvm.lv=rhel_hp-dl385g10-03/swap console=ttyS0,115200n81 LANG=en_US.UTF-8 earlyprintk=serial debug nokaslr" # kexec -e (or reboot) But this issue can not be reproduced on speedway machine, and this issue is irrelevant to my posted patches. The kernel log: [ 1248.932239] kexec_core: Starting new kernel early console in extract_kernel input_data: 0x00087e91c3b4 input_len: 0x0067fcbd output: 0x00087d40 output_len: 0x01b6fa90 kernel_total_size: 0x01a9d000 trampoline_32bit: 0x00099000 Decompressing Linux... Parsing ELF...[---Here the system will hang] 2. about SEV Upstream kernel(Host OS) doesn't work in host side, some drivers about SEV always go wrong in host side. We can't boot SEV Guest OS to test kdump patch. Maybe it is more reasonable to improve SEV in another patch. When some drivers can work in host side and it can also boot Virtual Machine(SEV Guest OS), it will be suitable to fix SEV for kdump. [ 369.426131] INFO: task systemd-udevd:865 blocked for more than 120 seconds. [ 369.433177] Not tainted 4.17.0-rc5+ #60 [ 369.437585] "echo 0 > /proc/sys/kernel/hung_t