Re: [RFC v2] /dev/iommu uAPI proposal
On Mon, Aug 09, 2021 at 08:34:06AM +, Tian, Kevin wrote: > > From: David Gibson > > Sent: Friday, August 6, 2021 12:45 PM > > > > > In concept I feel the purpose of DMA endpoint is equivalent to the > > routing > > > > > info in this proposal. > > > > > > > > Maybe? I'm afraid I never quite managed to understand the role of the > > > > routing info in your proposal. > > > > > > > > > > the IOMMU routes incoming DMA packets to a specific I/O page table, > > > according to RID or RID+PASID carried in the packet. RID or RID+PASID > > > is the routing information (represented by device cookie +PASID in > > > proposed uAPI) and what the iommu driver really cares when activating > > > the I/O page table in the iommu. > > > > Ok, so yes, endpoint is roughly equivalent to that. But my point is > > that the IOMMU layer really only cares about that (device+routing) > > combination, not other aspects of what the device is. So that's the > > concept we should give a name and put front and center in the API. > > > > This is how this proposal works, centered around the routing info. the > uAPI doesn't care what the device is. It just requires the user to specify > the user view of routing info (device fd + optional pasid) to tag an IOAS. Which works as long as (just device fd) and (device fd + PASID) covers all the options. If we have new possibilities we need new interfaces. And, that can't even handle the case of one endpoint for multiple devices (e.g. ACS-incapable bridge). > the user view is then converted to the kernel view of routing (rid or > rid+pasid) by vfio driver and then passed to iommu fd in the attaching > operation. and GET_INFO interface is provided for the user to check > whether a device supports multiple IOASes and whether pasid space is > delegated to the user. We just need a better name if pasid is considered > too pci specific... > > But creating an endpoint per ioasid and making it centered in uAPI is not > what the IOMMU layer cares about. It's not an endpoint per ioasid. You can have multiple endpoints per ioasid, just not the other way around. As it is multiple IOASes per device means *some* sort of disambiguation (generally by PASID) which is hard to describe generall. Having endpoints as a first-class concept makes that simpler. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[Patch V3 2/2] iommu/arm-smmu: Fix race condition during iommu_group creation
From: Krishna Reddy When two devices with same SID are getting probed concurrently through iommu_probe_device(), the iommu_group sometimes is getting allocated more than once as call to arm_smmu_device_group() is not protected for concurrency. Furthermore, it leads to each device holding a different iommu_group and domain pointer, separate IOVA space and only one of the devices' domain is used for translations from IOMMU. This causes accesses from other device to fault or see incorrect translations. Fix this by protecting iommu_group allocation from concurrency in arm_smmu_device_group(). Signed-off-by: Krishna Reddy Signed-off-by: Ashish Mhetre --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index dba15f3..f9c6648 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1457,6 +1457,7 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) struct iommu_group *group = NULL; int i, idx; + mutex_lock(>stream_map_mutex); for_each_cfg_sme(cfg, fwspec, i, idx) { if (group && smmu->s2crs[idx].group && group != smmu->s2crs[idx].group) @@ -1465,8 +1466,10 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) group = smmu->s2crs[idx].group; } - if (group) + if (group) { + mutex_unlock(>stream_map_mutex); return iommu_group_ref_get(group); + } if (dev_is_pci(dev)) group = pci_device_group(dev); @@ -1480,6 +1483,7 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) for_each_cfg_sme(cfg, fwspec, i, idx) smmu->s2crs[idx].group = group; + mutex_unlock(>stream_map_mutex); return group; } -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[Patch V3 1/2] iommu: Fix race condition during default domain allocation
When two devices with same SID are getting probed concurrently through iommu_probe_device(), the iommu_domain sometimes is getting allocated more than once as call to iommu_alloc_default_domain() is not protected for concurrency. Furthermore, it leads to each device holding a different iommu_domain pointer, separate IOVA space and only one of the devices' domain is used for translations from IOMMU. This causes accesses from other device to fault or see incorrect translations. Fix this by protecting iommu_alloc_default_domain() call with group->mutex and let all devices with same SID share same iommu_domain. Signed-off-by: Ashish Mhetre --- drivers/iommu/iommu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5419c4b..80c5a1c 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -273,7 +273,9 @@ int iommu_probe_device(struct device *dev) * support default domains, so the return value is not yet * checked. */ + mutex_lock(>mutex); iommu_alloc_default_domain(group, dev); + mutex_unlock(>mutex); if (group->default_domain) { ret = __iommu_attach_device(group->default_domain, dev); -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[Patch V3 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation
When two devices with same SID are getting probed concurrently through iommu_probe_device(), the iommu_group and iommu_domain are allocated more than once because they are not protected for concurrency. This is leading to context faults when one device is accessing IOVA from other device. Fix this by protecting iommu_domain and iommu_group creation with mutexes. Changes in v3: * Updated commit messages. * Added Signed-off-by in patch 2. Ashish Mhetre (1): iommu: Fix race condition during default domain allocation Krishna Reddy (1): iommu/arm-smmu: Fix race condition during iommu_group creation drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +- drivers/iommu/iommu.c | 2 ++ 2 files changed, 7 insertions(+), 1 deletion(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()
在 2021/8/9 下午1:56, Yongji Xie 写道: On Thu, Aug 5, 2021 at 9:31 PM Jason Wang wrote: 在 2021/8/5 下午8:34, Yongji Xie 写道: My main point, though, is that if you've already got something else keeping track of the actual addresses, then the way you're using an iova_domain appears to be something you could do with a trivial bitmap allocator. That's why I don't buy the efficiency argument. The main design points of the IOVA allocator are to manage large address spaces while trying to maximise spatial locality to minimise the underlying pagetable usage, and allocating with a flexible limit to support multiple devices with different addressing capabilities in the same address space. If none of those aspects are relevant to the use-case - which AFAICS appears to be true here - then as a general-purpose resource allocator it's rubbish and has an unreasonably massive memory overhead and there are many, many better choices. OK, I get your point. Actually we used the genpool allocator in the early version. Maybe we can fall back to using it. I think maybe you can share some perf numbers to see how much alloc_iova_fast() can help. I did some fio tests[1] with a ram-backend vduse block device[2]. Following are some performance data: numjobs=1 numjobs=2numjobs=4 numjobs=8 iova_alloc_fast145k iops 265k iops 514k iops 758k iops iova_alloc137k iops 170k iops 128k iops 113k iops gen_pool_alloc 143k iops 270k iops 458k iops 521k iops The iova_alloc_fast() has the best performance since we always hit the per-cpu cache. Regardless of the per-cpu cache, the genpool allocator should be better than the iova allocator. I think we see convincing numbers for using iova_alloc_fast() than the gen_poll_alloc() (45% improvement on job=8). Thanks [1] fio jobfile: [global] rw=randread direct=1 ioengine=libaio iodepth=16 time_based=1 runtime=60s group_reporting bs=4k filename=/dev/vda [job] numjobs=.. [2] $ qemu-storage-daemon \ --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \ --monitor chardev=charmonitor \ --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \ --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128 The qemu-storage-daemon can be builded based on the repo: https://github.com/bytedance/qemu/tree/vduse-test. Thanks, Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/9] dma-iommu: bounce buffers for untrusted devices
On Tue, Aug 10, 2021 at 10:19 AM Mi, Dapeng1 wrote: > > Hi David, > > I like this patch set and this is crucial for reducing the significant vIOMMU > performance. It looks you totally rewrite the IOMMU mapping/unmapping part > and use the dynamically allocated memory from buddy system as bounce buffer > instead of using the legacy SWIOTLB bounce buffer. As I know, some legacy > devices' DMA could not access the memory larger than 32-bit memory space and > the dynamically allocated memory address could exceed the 32-bit memory > space. Is it a problem? My understanding is that when devices with that sort of limitation sit behind an IOMMU, the IOVA is what matters, not the physical address. The bounce bounce buffers use the same limits for IOVA allocation as the regular dma-iommu path, so compatible IOVAs will be allocated for the bounce buffers. -David > Thx, > Dapeng Mi ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 3/9] dma-iommu: bounce buffers for untrusted devices
Hi David, I like this patch set and this is crucial for reducing the significant vIOMMU performance. It looks you totally rewrite the IOMMU mapping/unmapping part and use the dynamically allocated memory from buddy system as bounce buffer instead of using the legacy SWIOTLB bounce buffer. As I know, some legacy devices' DMA could not access the memory larger than 32-bit memory space and the dynamically allocated memory address could exceed the 32-bit memory space. Is it a problem? Thx, Dapeng Mi -Original Message- From: iommu On Behalf Of David Stevens Sent: Friday, August 6, 2021 6:34 PM To: Robin Murphy Cc: linux-ker...@vger.kernel.org; Sergey Senozhatsky ; iommu@lists.linux-foundation.org; David Stevens ; Will Deacon ; Christoph Hellwig Subject: [PATCH v2 3/9] dma-iommu: bounce buffers for untrusted devices From: David Stevens Add support for dynamic bounce buffers to the dma-api for use with subgranule IOMMU mappings with untrusted devices. Bounce buffer management is split into two parts. First, there is a buffer manager that is responsible for allocating and tracking buffers. Second, there is a layer that uses the managed buffers as bounce buffers. It is responsible for managing the IOMMU mapping and for syncing between the original and bounce buffers. For now, buffer management is very simple - every mapping allocates a new bounce buffer. Signed-off-by: David Stevens --- drivers/iommu/Makefile| 2 +- drivers/iommu/dma-iommu.c | 70 +- drivers/iommu/io-bounce-buffers.c | 358 ++ drivers/iommu/io-bounce-buffers.h | 46 drivers/iommu/io-buffer-manager.c | 212 ++ drivers/iommu/io-buffer-manager.h | 43 6 files changed, 728 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/io-bounce-buffers.c create mode 100644 drivers/iommu/io-bounce-buffers.h create mode 100644 drivers/iommu/io-buffer-manager.c create mode 100644 drivers/iommu/io-buffer-manager.h diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index c0fb0ba88143..4edaf7adc082 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -4,7 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o -obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o +obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o io-bounce-buffers.o +io-buffer-manager.o obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 055ccda5eba1..908eb6fb7dc3 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -24,6 +24,8 @@ #include #include +#include "io-bounce-buffers.h" + struct iommu_dma_msi_page { struct list_headlist; dma_addr_t iova; @@ -44,6 +46,7 @@ struct iommu_dma_cookie { dma_addr_t msi_iova; }; struct list_headmsi_page_list; + struct io_bounce_buffers*bounce_buffers; /* Domain for flush queue callback; NULL if flush queue not in use */ struct iommu_domain *fq_domain; @@ -81,6 +84,14 @@ static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie) return PAGE_SIZE; } +static struct io_bounce_buffers *dev_to_io_bounce_buffers(struct device +*dev) { + struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie = domain->iova_cookie; + + return cookie->bounce_buffers; +} + static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type) { struct iommu_dma_cookie *cookie; @@ -160,6 +171,9 @@ void iommu_put_dma_cookie(struct iommu_domain *domain) if (!cookie) return; + if (cookie->bounce_buffers) + io_bounce_buffers_destroy(cookie->bounce_buffers); + if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule) put_iova_domain(>iovad); @@ -333,6 +347,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, struct iommu_dma_cookie *cookie = domain->iova_cookie; unsigned long order, base_pfn; struct iova_domain *iovad; + int ret; if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) return -EINVAL; @@ -380,7 +395,16 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, if (!dev) return 0; - return iova_reserve_iommu_regions(dev, domain); + ret = iova_reserve_iommu_regions(dev, domain); + + if (ret == 0 && dev_is_untrusted(dev)) { + cookie->bounce_buffers = + io_bounce_buffers_init(dev, domain, iovad); +
[iommu:apple/dart 3/3] drivers/iommu/apple-dart.c:730:17: error: initialization of 'size_t (*)(struct iommu_domain *, long unsigned int, size_t, struct iommu_iotlb_gather *)' {aka 'long unsigned int
tree: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git apple/dart head: 05ce9d20d699b093dec985192a7db63b48f26ca2 commit: 05ce9d20d699b093dec985192a7db63b48f26ca2 [3/3] iommu/dart: Add DART iommu driver config: sparc-allyesconfig (attached as .config) compiler: sparc64-linux-gcc (GCC) 10.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?id=05ce9d20d699b093dec985192a7db63b48f26ca2 git remote add iommu https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git git fetch --no-tags iommu apple/dart git checkout 05ce9d20d699b093dec985192a7db63b48f26ca2 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross ARCH=sparc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): drivers/iommu/apple-dart.c: In function 'apple_dart_map_pages': drivers/iommu/apple-dart.c:380:12: error: 'struct io_pgtable_ops' has no member named 'map_pages' 380 | return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, |^~ drivers/iommu/apple-dart.c: In function 'apple_dart_unmap_pages': drivers/iommu/apple-dart.c:392:12: error: 'struct io_pgtable_ops' has no member named 'unmap_pages' 392 | return ops->unmap_pages(ops, iova, pgsize, pgcount, gather); |^~ drivers/iommu/apple-dart.c: At top level: drivers/iommu/apple-dart.c:729:3: error: 'const struct iommu_ops' has no member named 'map_pages' 729 | .map_pages = apple_dart_map_pages, | ^ drivers/iommu/apple-dart.c:729:15: error: initialization of 'int (*)(struct iommu_domain *, long unsigned int, phys_addr_t, size_t, int, gfp_t)' {aka 'int (*)(struct iommu_domain *, long unsigned int, long long unsigned int, long unsigned int, int, unsigned int)'} from incompatible pointer type 'int (*)(struct iommu_domain *, long unsigned int, phys_addr_t, size_t, size_t, int, gfp_t, size_t *)' {aka 'int (*)(struct iommu_domain *, long unsigned int, long long unsigned int, long unsigned int, long unsigned int, int, unsigned int, long unsigned int *)'} [-Werror=incompatible-pointer-types] 729 | .map_pages = apple_dart_map_pages, | ^~~~ drivers/iommu/apple-dart.c:729:15: note: (near initialization for 'apple_dart_iommu_ops.map') drivers/iommu/apple-dart.c:730:3: error: 'const struct iommu_ops' has no member named 'unmap_pages' 730 | .unmap_pages = apple_dart_unmap_pages, | ^~~ >> drivers/iommu/apple-dart.c:730:17: error: initialization of 'size_t >> (*)(struct iommu_domain *, long unsigned int, size_t, struct >> iommu_iotlb_gather *)' {aka 'long unsigned int (*)(struct iommu_domain *, >> long unsigned int, long unsigned int, struct iommu_iotlb_gather *)'} from >> incompatible pointer type 'size_t (*)(struct iommu_domain *, long unsigned >> int, size_t, size_t, struct iommu_iotlb_gather *)' {aka 'long unsigned >> int (*)(struct iommu_domain *, long unsigned int, long unsigned int, long >> unsigned int, struct iommu_iotlb_gather *)'} >> [-Werror=incompatible-pointer-types] 730 | .unmap_pages = apple_dart_unmap_pages, | ^~ drivers/iommu/apple-dart.c:730:17: note: (near initialization for 'apple_dart_iommu_ops.unmap') drivers/iommu/apple-dart.c: In function 'apple_dart_unmap_pages': drivers/iommu/apple-dart.c:393:1: error: control reaches end of non-void function [-Werror=return-type] 393 | } | ^ drivers/iommu/apple-dart.c: In function 'apple_dart_map_pages': drivers/iommu/apple-dart.c:382:1: error: control reaches end of non-void function [-Werror=return-type] 382 | } | ^ cc1: some warnings being treated as errors vim +730 drivers/iommu/apple-dart.c 723 724 static const struct iommu_ops apple_dart_iommu_ops = { 725 .domain_alloc = apple_dart_domain_alloc, 726 .domain_free = apple_dart_domain_free, 727 .attach_dev = apple_dart_attach_dev, 728 .detach_dev = apple_dart_detach_dev, > 729 .map_pages = apple_dart_map_pages, > 730 .unmap_pages = apple_dart_unmap_pages, 731 .flush_iotlb_all = apple_dart_flush_iotlb_all, 732 .iotlb_sync = apple_dart_iotlb_sync, 733 .iotlb_sync_map = apple_dart_iotlb_sync_map, 734 .iova_to_phys = apple_dart_iova_to_phys, 735 .probe_device = apple_dart_probe_device, 736 .release_device = apple_dart_release_device, 737 .device_group = apple_dart_device_group, 738 .of_xlate =
Re: [PATCH 00/11] Implement generic prot_guest_has() helper function
On 8/8/21 8:41 PM, Kuppuswamy, Sathyanarayanan wrote: > Hi Tom, > > On 7/27/21 3:26 PM, Tom Lendacky wrote: >> This patch series provides a generic helper function, prot_guest_has(), >> to replace the sme_active(), sev_active(), sev_es_active() and >> mem_encrypt_active() functions. >> >> It is expected that as new protected virtualization technologies are >> added to the kernel, they can all be covered by a single function call >> instead of a collection of specific function calls all called from the >> same locations. >> >> The powerpc and s390 patches have been compile tested only. Can the >> folks copied on this series verify that nothing breaks for them. > > With this patch set, select ARCH_HAS_PROTECTED_GUEST and set > CONFIG_AMD_MEM_ENCRYPT=n, creates following error. > > ld: arch/x86/mm/ioremap.o: in function `early_memremap_is_setup_data': > arch/x86/mm/ioremap.c:672: undefined reference to `early_memremap_decrypted' > > It looks like early_memremap_is_setup_data() is not protected with > appropriate config. Ok, thanks for finding that. I'll fix that. Thanks, Tom > > >> >> Cc: Andi Kleen >> Cc: Andy Lutomirski >> Cc: Ard Biesheuvel >> Cc: Baoquan He >> Cc: Benjamin Herrenschmidt >> Cc: Borislav Petkov >> Cc: Christian Borntraeger >> Cc: Daniel Vetter >> Cc: Dave Hansen >> Cc: Dave Young >> Cc: David Airlie >> Cc: Heiko Carstens >> Cc: Ingo Molnar >> Cc: Joerg Roedel >> Cc: Maarten Lankhorst >> Cc: Maxime Ripard >> Cc: Michael Ellerman >> Cc: Paul Mackerras >> Cc: Peter Zijlstra >> Cc: Thomas Gleixner >> Cc: Thomas Zimmermann >> Cc: Vasily Gorbik >> Cc: VMware Graphics >> Cc: Will Deacon >> >> --- >> >> Patches based on: >> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftip%2Ftip.gitdata=04%7C01%7Cthomas.lendacky%40amd.com%7C563b5e30a3254f6739aa08d95ad6e242%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637640701228434514%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vx9v4EmYqVTsJ7KSr97gQaBWJ%2Fq%2BE9NOzXMhe3Fp7T8%3Dreserved=0 >> master >> commit 79e920060fa7 ("Merge branch 'WIP/fixes'") >> >> Tom Lendacky (11): >> mm: Introduce a function to check for virtualization protection >> features >> x86/sev: Add an x86 version of prot_guest_has() >> powerpc/pseries/svm: Add a powerpc version of prot_guest_has() >> x86/sme: Replace occurrences of sme_active() with prot_guest_has() >> x86/sev: Replace occurrences of sev_active() with prot_guest_has() >> x86/sev: Replace occurrences of sev_es_active() with prot_guest_has() >> treewide: Replace the use of mem_encrypt_active() with >> prot_guest_has() >> mm: Remove the now unused mem_encrypt_active() function >> x86/sev: Remove the now unused mem_encrypt_active() function >> powerpc/pseries/svm: Remove the now unused mem_encrypt_active() >> function >> s390/mm: Remove the now unused mem_encrypt_active() function >> >> arch/Kconfig | 3 ++ >> arch/powerpc/include/asm/mem_encrypt.h | 5 -- >> arch/powerpc/include/asm/protected_guest.h | 30 +++ >> arch/powerpc/platforms/pseries/Kconfig | 1 + >> arch/s390/include/asm/mem_encrypt.h | 2 - >> arch/x86/Kconfig | 1 + >> arch/x86/include/asm/kexec.h | 2 +- >> arch/x86/include/asm/mem_encrypt.h | 13 + >> arch/x86/include/asm/protected_guest.h | 27 ++ >> arch/x86/kernel/crash_dump_64.c | 4 +- >> arch/x86/kernel/head64.c | 4 +- >> arch/x86/kernel/kvm.c | 3 +- >> arch/x86/kernel/kvmclock.c | 4 +- >> arch/x86/kernel/machine_kexec_64.c | 19 +++ >> arch/x86/kernel/pci-swiotlb.c | 9 ++-- >> arch/x86/kernel/relocate_kernel_64.S | 2 +- >> arch/x86/kernel/sev.c | 6 +-- >> arch/x86/kvm/svm/svm.c | 3 +- >> arch/x86/mm/ioremap.c | 16 +++--- >> arch/x86/mm/mem_encrypt.c | 60 +++--- >> arch/x86/mm/mem_encrypt_identity.c | 3 +- >> arch/x86/mm/pat/set_memory.c | 3 +- >> arch/x86/platform/efi/efi_64.c | 9 ++-- >> arch/x86/realmode/init.c | 8 +-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- >> drivers/gpu/drm/drm_cache.c | 4 +- >> drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 +- >> drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +-- >> drivers/iommu/amd/init.c | 7 +-- >> drivers/iommu/amd/iommu.c | 3 +- >> drivers/iommu/amd/iommu_v2.c | 3 +- >> drivers/iommu/iommu.c | 3 +- >> fs/proc/vmcore.c | 6 +-- >> include/linux/mem_encrypt.h | 4 -- >>
Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
On 8/9/21 10:56 AM, Tianyu Lan wrote: > From: Tianyu Lan > > Add new hvcall guest address host visibility support to mark > memory visible to host. Call it inside set_memory_decrypted > /encrypted(). Add HYPERVISOR feature check in the > hv_is_isolation_supported() to optimize in non-virtualization > environment. >From an x86/mm perspective: Acked-by: Dave Hansen A tiny nit: > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 0bb4d9ca7a55..b3683083208a 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type); > > bool hv_is_isolation_supported(void) > { > + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) > + return 0; > + > + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV)) > + return 0; > + > return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE; > } This might be worthwhile to move to a header. That ensures that hv_is_isolation_supported() use can avoid even a function call. But, I see this is used in modules and its use here is also in a slow path, so it's not a big deal ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
On 8/9/21 2:59 PM, Tom Lendacky wrote: Not sure how TDX will handle AP booting, are you sure it needs this special setup as well? Otherwise a check for SEV-ES would be better instead of the generic PATTR_GUEST_PROT_STATE. Yes, I'm not sure either. I figure that change can be made, if needed, as part of the TDX support. We don't plan to set PROT_STATE. So it does not affect TDX. For SMP, we use MADT ACPI table for AP booting. -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()
On 8/2/21 7:42 AM, Christophe Leroy wrote: > > > Le 28/07/2021 à 00:26, Tom Lendacky a écrit : >> Replace occurrences of mem_encrypt_active() with calls to prot_guest_has() >> with the PATTR_MEM_ENCRYPT attribute. > > > What about > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Flinuxppc-dev%2Fpatch%2F20210730114231.23445-1-will%40kernel.org%2Fdata=04%7C01%7Cthomas.lendacky%40amd.com%7C1198d62463e04a27be5908d955b30433%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635049667233612%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Erpu4Du05sVYkYuAfTkXdLvq48%2FlfLS2q%2FZW8DG3tFw%3Dreserved=0> > ? Ah, looks like that just went into the PPC tree and isn't part of the tip tree. I'll have to look into how to handle that one. Thanks, Tom > > Christophe > > >> >> Cc: Thomas Gleixner >> Cc: Ingo Molnar >> Cc: Borislav Petkov >> Cc: Dave Hansen >> Cc: Andy Lutomirski >> Cc: Peter Zijlstra >> Cc: David Airlie >> Cc: Daniel Vetter >> Cc: Maarten Lankhorst >> Cc: Maxime Ripard >> Cc: Thomas Zimmermann >> Cc: VMware Graphics >> Cc: Joerg Roedel >> Cc: Will Deacon >> Cc: Dave Young >> Cc: Baoquan He >> Signed-off-by: Tom Lendacky >> --- >> arch/x86/kernel/head64.c | 4 ++-- >> arch/x86/mm/ioremap.c | 4 ++-- >> arch/x86/mm/mem_encrypt.c | 5 ++--- >> arch/x86/mm/pat/set_memory.c | 3 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++- >> drivers/gpu/drm/drm_cache.c | 4 ++-- >> drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++-- >> drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++--- >> drivers/iommu/amd/iommu.c | 3 ++- >> drivers/iommu/amd/iommu_v2.c | 3 ++- >> drivers/iommu/iommu.c | 3 ++- >> fs/proc/vmcore.c | 6 +++--- >> kernel/dma/swiotlb.c | 4 ++-- >> 13 files changed, 29 insertions(+), 24 deletions(-) >> >> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c >> index de01903c3735..cafed6456d45 100644 >> --- a/arch/x86/kernel/head64.c >> +++ b/arch/x86/kernel/head64.c >> @@ -19,7 +19,7 @@ >> #include >> #include >> #include >> -#include >> +#include >> #include >> #include >> @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long >> physaddr, >> * there is no need to zero it after changing the memory encryption >> * attribute. >> */ >> - if (mem_encrypt_active()) { >> + if (prot_guest_has(PATTR_MEM_ENCRYPT)) { >> vaddr = (unsigned long)__start_bss_decrypted; >> vaddr_end = (unsigned long)__end_bss_decrypted; >> for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { >> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c >> index 0f2d5ace5986..5e1c1f5cbbe8 100644 >> --- a/arch/x86/mm/ioremap.c >> +++ b/arch/x86/mm/ioremap.c >> @@ -693,7 +693,7 @@ static bool __init >> early_memremap_is_setup_data(resource_size_t phys_addr, >> bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned >> long size, >> unsigned long flags) >> { >> - if (!mem_encrypt_active()) >> + if (!prot_guest_has(PATTR_MEM_ENCRYPT)) >> return true; >> if (flags & MEMREMAP_ENC) >> @@ -723,7 +723,7 @@ pgprot_t __init >> early_memremap_pgprot_adjust(resource_size_t phys_addr, >> { >> bool encrypted_prot; >> - if (!mem_encrypt_active()) >> + if (!prot_guest_has(PATTR_MEM_ENCRYPT)) >> return prot; >> encrypted_prot = true; >> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c >> index 451de8e84fce..0f1533dbe81c 100644 >> --- a/arch/x86/mm/mem_encrypt.c >> +++ b/arch/x86/mm/mem_encrypt.c >> @@ -364,8 +364,7 @@ int __init early_set_memory_encrypted(unsigned long >> vaddr, unsigned long size) >> /* >> * SME and SEV are very similar but they are not the same, so there are >> * times that the kernel will need to distinguish between SME and SEV. >> The >> - * sme_active() and sev_active() functions are used for this. When a >> - * distinction isn't needed, the mem_encrypt_active() function can be >> used. >> + * sme_active() and sev_active() functions are used for this. >> * >> * The trampoline code is a good example for this requirement. Before >> * paging is activated, SME will access all memory as decrypted, but SEV >> @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void) >> * The unused memory range was mapped decrypted, change the >> encryption >> * attribute from decrypted to encrypted before freeing it. >> */ >> - if (mem_encrypt_active()) { >> + if (sme_me_mask) { >> r = set_memory_encrypted(vaddr, npages); >> if (r) { >> pr_warn("failed to free unused decrypted pages\n"); >> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
On 8/2/21 5:45 AM, Joerg Roedel wrote: > On Tue, Jul 27, 2021 at 05:26:09PM -0500, Tom Lendacky wrote: >> @@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct >> trampoline_header *th) >> if (prot_guest_has(PATTR_HOST_MEM_ENCRYPT)) >> th->flags |= TH_FLAGS_SME_ACTIVE; >> >> -if (sev_es_active()) { >> +if (prot_guest_has(PATTR_GUEST_PROT_STATE)) { >> /* >> * Skip the call to verify_cpu() in secondary_startup_64 as it >> * will cause #VC exceptions when the AP can't handle them yet. > > Not sure how TDX will handle AP booting, are you sure it needs this > special setup as well? Otherwise a check for SEV-ES would be better > instead of the generic PATTR_GUEST_PROT_STATE. Yes, I'm not sure either. I figure that change can be made, if needed, as part of the TDX support. Thanks, Tom > > Regards, > > Joerg > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()
On 7/30/21 5:34 PM, Sean Christopherson wrote: > On Tue, Jul 27, 2021, Tom Lendacky wrote: >> @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void) >> * The unused memory range was mapped decrypted, change the encryption >> * attribute from decrypted to encrypted before freeing it. >> */ >> -if (mem_encrypt_active()) { >> +if (sme_me_mask) { > > Any reason this uses sme_me_mask? The helper it calls, > __set_memory_enc_dec(), > uses prot_guest_has(PATTR_MEM_ENCRYPT) so I assume it's available? Probably just a slip on my part. I was debating at one point calling the helper vs. referencing the variables/functions directly in the mem_encrypt.c file. Thanks, Tom > >> r = set_memory_encrypted(vaddr, npages); >> if (r) { >> pr_warn("failed to free unused decrypted pages\n"); > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE
On Mon, Aug 9, 2021, at 19:41, Robin Murphy wrote: > On 2021-08-07 12:47, Sven Peter via iommu wrote: > > > > > > On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote: > >> On 2021-08-06 16:55, Sven Peter via iommu wrote: > >>> @@ -1006,6 +1019,31 @@ static int iommu_dma_map_sg(struct device *dev, > >>> struct scatterlist *sg, > >>> if (dev_is_untrusted(dev)) > >>> return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, > >>> attrs); > >>> > >>> + /* > >>> + * If the IOMMU pagesize is larger than the CPU pagesize we will > >>> + * very likely run into sgs with a physical address that is not aligned > >>> + * to an IOMMU page boundary. Fall back to just mapping every entry > >>> + * independently with __iommu_dma_map then. > >> > >> Scatterlist segments often don't have nicely aligned ends, which is why > >> we already align things to IOVA granules in main loop here. I think in > >> principle we'd just need to move the non-IOVA-aligned part of the > >> address from sg->page to sg->offset in the temporary transformation for > >> the rest of the assumptions to hold. I don't blame you for being timid > >> about touching that, though - it took me 3 tries to get right when I > >> first wrote it... > >> > > > > > > I've spent some time with that code now and I think we cannot use it > > but have to fall back to iommu_dma_map_sg_swiotlb (even though that swiotlb > > part is a lie then): > > > > When we have sg_phys(s) = 0x802e65000 with s->offset = 0 the paddr > > is aligned to PAGE_SIZE but has an offset of 0x1000 from something > > the IOMMU can map. > > Now this would result in s->offset = -0x1000 which is already weird > > enough. > > Offset is unsigned (and 32bit) so this will actually look like > > s->offset = 0xf000 then, which isn't much better. > > And then sg_phys(s) = 0x902e64000 (instead of 0x802e64000) and > > we'll map some random memory in iommu_map_sg_atomic and a little bit later > > everything explodes. > > > > Now I could probably adjust the phys addr backwards and make sure offset is > > always positive (and possibly larger than PAGE_SIZE) and later restore it > > in __finalise_sg then but I feel like that's pushing this a little bit too > > far. > > Yes, that's what I meant. At a quick guess, something like the > completely untested diff below. That unfortunately results in unaligned mappings [9.630334] iommu: unaligned: iova 0xbff4 pa 0x000801a3b000 size 0x4000 min_pagesz 0x4000 I'll take a closer look later this week and see if I can fix it. > It really comes down to what we want to > achieve here - if it's just to make this thing work at all, then I'd > favour bolting on the absolute minimum changes, possibly even cheating > by tainting the kernel and saying all bets are off instead of trying to > handle the more involved corners really properly. However if you want to > work towards this being a properly-supported thing, then I think it's > worth generalising the existing assumptions of page alignment from the > beginning. I'd like to try and see if we can make this a properly-supported thing. That will likely take a few iterations but realistically the rest of the drivers required to make this platform actually useful (and especially the display controller and GPU drivers) won't be ready for a few more months anyway. And even on 4KB PAGE_SIZE kernels half the USB ports and NVMe will work fine, which should be enough to install a distro and some third-party package that just ships the distro kernel with 16KB pages. Sven ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init
On Mon, Aug 9, 2021 at 12:59 PM Robin Murphy wrote: > > On 2021-08-09 20:05, Rajat Jain wrote: > > On Wed, Aug 4, 2021 at 10:16 AM Robin Murphy wrote: > >> > >> Factor out flush queue setup from the initial domain init so that we > >> can potentially trigger it from sysfs later on in a domain's lifetime. > >> > >> Reviewed-by: Lu Baolu > >> Reviewed-by: John Garry > >> Signed-off-by: Robin Murphy > >> --- > >> drivers/iommu/dma-iommu.c | 30 -- > >> include/linux/dma-iommu.h | 9 ++--- > >> 2 files changed, 26 insertions(+), 13 deletions(-) > >> > >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > >> index 2e19505dddf9..f51b8dc99ac6 100644 > >> --- a/drivers/iommu/dma-iommu.c > >> +++ b/drivers/iommu/dma-iommu.c > >> @@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev) > >> return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; > >> } > >> > >> +int iommu_dma_init_fq(struct iommu_domain *domain) > >> +{ > >> + struct iommu_dma_cookie *cookie = domain->iova_cookie; > >> + > >> + if (domain->type != IOMMU_DOMAIN_DMA_FQ) > >> + return -EINVAL; > >> + if (cookie->fq_domain) > >> + return 0; > >> + > >> + if (init_iova_flush_queue(>iovad, > >> iommu_dma_flush_iotlb_all, > >> + iommu_dma_entry_dtor)) { > >> + pr_warn("iova flush queue initialization failed\n"); > >> + domain->type = IOMMU_DOMAIN_DMA; > >> + return -ENODEV; > >> + } > >> + cookie->fq_domain = domain; > >> + return 0; > >> +} > >> + > >> /** > >>* iommu_dma_init_domain - Initialise a DMA mapping domain > >>* @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() > >> @@ -362,16 +381,7 @@ static int iommu_dma_init_domain(struct iommu_domain > >> *domain, dma_addr_t base, > >> } > >> > >> init_iova_domain(iovad, 1UL << order, base_pfn); > >> - > >> - if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) { > >> - if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, > >> - iommu_dma_entry_dtor)) { > >> - pr_warn("iova flush queue initialization > >> failed\n"); > >> - domain->type = IOMMU_DOMAIN_DMA; > >> - } else { > >> - cookie->fq_domain = domain; > >> - } > >> - } > >> + iommu_dma_init_fq(domain); > >> > >> return iova_reserve_iommu_regions(dev, domain); > >> } > >> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h > >> index 758ca4694257..81ab647f1618 100644 > >> --- a/include/linux/dma-iommu.h > >> +++ b/include/linux/dma-iommu.h > >> @@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain); > >> > >> /* Setup call for arch DMA mapping code */ > >> void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 > >> dma_limit); > >> +int iommu_dma_init_fq(struct iommu_domain *domain); > >> > >> /* The DMA API isn't _quite_ the whole story, though... */ > >> /* > >> @@ -37,9 +38,6 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, > >> > >> void iommu_dma_get_resv_regions(struct device *dev, struct list_head > >> *list); > >> > >> -void iommu_dma_free_cpu_cached_iovas(unsigned int cpu, > >> - struct iommu_domain *domain); > >> - > > > > This looks like an unrelated code cleanup. Should this be a separate patch? > > Ha, busted! Much of this was done in the "stream of consciousness" style > where I made a big sprawling mess then split it up into patches and > branches afterwards. TBH it was already feeling pretty tenuous having a > separate patch just to move this one function, and it only gets more so > with the simplification Will pointed out earlier. I think I'll squash > iommu_dma_init_fq() into the next patch then do a thorough header sweep, > since I've now spotted some things in iova.h which could probably go as > well. Thank you. I chanced upon this only because I've backported your patchset (and some other changes that it depends on) to 5.10 which is the kernel we currently use for our Intel platforms, and this cleanup hunk was creating a problem (since 5.10 still uses the symbol you removed). I'll be giving your v3 patchset a spin in my setup and update you in case I see any issue. Thanks, Rajat > > Thanks for the poke! > > Robin. > > > > > Thanks, > > > > Rajat > > > > > >> extern bool iommu_dma_forcedac; > >> > >> #else /* CONFIG_IOMMU_DMA */ > >> @@ -54,6 +52,11 @@ static inline void iommu_setup_dma_ops(struct device > >> *dev, u64 dma_base, > >> { > >> } > >> > >> +static inline int iommu_dma_init_fq(struct iommu_domain *domain) > >> +{ > >> + return -EINVAL; > >> +} > >> + > >> static inline int iommu_get_dma_cookie(struct iommu_domain *domain) > >> { > >> return -ENODEV; >
Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE
Hi, On Mon, Aug 9, 2021, at 20:37, Robin Murphy wrote: > On 2021-08-07 09:41, Sven Peter wrote: > > Hi, > > > > Thanks a lot for quick reply! > > > > On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote: > >> On 2021-08-06 16:55, Sven Peter via iommu wrote: > >>> DMA IOMMU domains can support hardware where the IOMMU page size is > >>> larger than the CPU page size. > >>> Alignments need to be done with respect to both PAGE_SIZE and > >>> iovad->granule. Additionally, the sg list optimization to use a single > >>> IOVA allocation cannot be used in those cases since the physical > >>> addresses will very likely not be aligned to the larger IOMMU page size. > >>> > >>> Signed-off-by: Sven Peter > >>> --- > >>>drivers/iommu/dma-iommu.c | 87 ++- > >>>1 file changed, 77 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > >>> index 6f0df629353f..e072d9030d9f 100644 > >>> --- a/drivers/iommu/dma-iommu.c > >>> +++ b/drivers/iommu/dma-iommu.c > >>> @@ -8,6 +8,7 @@ > >>> * Copyright (C) 2000-2004 Russell King > >>> */ > >>> > >>> +#include > >>>#include > >>>#include > >>>#include > >>> @@ -51,6 +52,15 @@ struct iommu_dma_cookie { > >>> struct iommu_domain *fq_domain; > >>>}; > >>> > >>> +/* aligns size to CPU and IOMMU page size */ > >>> +static inline size_t iommu_page_align(struct device *dev, size_t size) > >>> +{ > >>> + struct iommu_domain *domain = iommu_get_dma_domain(dev); > >>> + struct iommu_dma_cookie *cookie = domain->iova_cookie; > >>> + > >>> + return iova_align(>iovad, PAGE_ALIGN(size)); > >>> +} > >>> + > >>>static DEFINE_STATIC_KEY_FALSE(iommu_deferred_attach_enabled); > >>>bool iommu_dma_forcedac __read_mostly; > >>> > >>> @@ -647,6 +657,8 @@ static struct page **__iommu_dma_alloc_pages(struct > >>> device *dev, > >>>/* > >>> * If size is less than PAGE_SIZE, then a full CPU page will be > >>> allocated, > >>> * but an IOMMU which supports smaller pages might not map the whole > >>> thing. > >>> + * If the IOMMU page size is larger than the CPU page size, then the size > >>> + * will be aligned to that granularity and some memory will be left > >>> unused. > >> > >> Why do we need to increase the actual memory allocation? The point here > >> is that we allocate the smallest thing we can allocate and map the > >> smallest thing we can map - I think that still works the "wrong" way > >> round too, we should just need to start taking an IOVA offset into > >> account as in dma_map_page() if we can no longer assume it's 0 for a CPU > >> page. Sure we may expose some unrelated adjacent pages, but we'll > >> already be doing that to excess for streaming DMA so whoop de do. > > > > I agree for trusted devices, but untrusted ones (Thunderbolt, and depending > > on your > > risk tolerance possibly even the broadcom wifi) might also end up calling > > this. > > Oh, right, I hadn't considered actual untrusted device support at this > stage. Me neither :-) I did the alignment at first without thinking too much about it, then read your reply, and only *then* realized that there are untrusted devices for which this just happens to do the right thing (at the cost of wasting memory for everyone else, but I'll fix that). > > > For streaming DMA swiotlb will make sure that these won't see memory > > they're not supposed to access. > > I was slightly surprised to see that that does appear to work out OK, > but I guess SWIOTLB slots are already smaller than typical IOMMU pages, > so it falls out of that. Neat. > > > But, at least as far as I understand it, no swiotlb is in the way to catch > > devices > > who end up calling this function. That wasn't required because we used to > > get > > PAGE_SIZE aligned allocation here and every IOMMU so far would be able to > > easily > > map them without any spill overs. > > But now we'll end up exposing three more unrelated pages if the allocation > > is not increased. > > Fair enough, but then why still waste memory in the (usual) cases where > it logically *isn't* necessary? See above, didn't even consider this either. I should be able to fix this by first allocating <= 3 pages to reach a iovad->granule boundary (or just satisfy the full allocation length already), adjust the DMA offset forward for this first allocation and then just keep allocating iovad->granule blocks for the rest of the requested size. The current code already does the second part and I think I'll just need to add the first alignment part. > > >>> */ > >>>static struct page **__iommu_dma_alloc_noncontiguous(struct device > >>> *dev, > >>> size_t size, struct sg_table *sgt, gfp_t gfp, pgprot_t > >>> prot, > >>> @@ -736,7 +748,7 @@ static void *iommu_dma_alloc_remap(struct device > >>> *dev, size_t size, > >>> > >>>out_unmap: > >>>
Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init
On 2021-08-09 20:05, Rajat Jain wrote: On Wed, Aug 4, 2021 at 10:16 AM Robin Murphy wrote: Factor out flush queue setup from the initial domain init so that we can potentially trigger it from sysfs later on in a domain's lifetime. Reviewed-by: Lu Baolu Reviewed-by: John Garry Signed-off-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 30 -- include/linux/dma-iommu.h | 9 ++--- 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 2e19505dddf9..f51b8dc99ac6 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev) return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; } +int iommu_dma_init_fq(struct iommu_domain *domain) +{ + struct iommu_dma_cookie *cookie = domain->iova_cookie; + + if (domain->type != IOMMU_DOMAIN_DMA_FQ) + return -EINVAL; + if (cookie->fq_domain) + return 0; + + if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all, + iommu_dma_entry_dtor)) { + pr_warn("iova flush queue initialization failed\n"); + domain->type = IOMMU_DOMAIN_DMA; + return -ENODEV; + } + cookie->fq_domain = domain; + return 0; +} + /** * iommu_dma_init_domain - Initialise a DMA mapping domain * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() @@ -362,16 +381,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, } init_iova_domain(iovad, 1UL << order, base_pfn); - - if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) { - if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, - iommu_dma_entry_dtor)) { - pr_warn("iova flush queue initialization failed\n"); - domain->type = IOMMU_DOMAIN_DMA; - } else { - cookie->fq_domain = domain; - } - } + iommu_dma_init_fq(domain); return iova_reserve_iommu_regions(dev, domain); } diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 758ca4694257..81ab647f1618 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain); /* Setup call for arch DMA mapping code */ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); +int iommu_dma_init_fq(struct iommu_domain *domain); /* The DMA API isn't _quite_ the whole story, though... */ /* @@ -37,9 +38,6 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); -void iommu_dma_free_cpu_cached_iovas(unsigned int cpu, - struct iommu_domain *domain); - This looks like an unrelated code cleanup. Should this be a separate patch? Ha, busted! Much of this was done in the "stream of consciousness" style where I made a big sprawling mess then split it up into patches and branches afterwards. TBH it was already feeling pretty tenuous having a separate patch just to move this one function, and it only gets more so with the simplification Will pointed out earlier. I think I'll squash iommu_dma_init_fq() into the next patch then do a thorough header sweep, since I've now spotted some things in iova.h which could probably go as well. Thanks for the poke! Robin. Thanks, Rajat extern bool iommu_dma_forcedac; #else /* CONFIG_IOMMU_DMA */ @@ -54,6 +52,11 @@ static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base, { } +static inline int iommu_dma_init_fq(struct iommu_domain *domain) +{ + return -EINVAL; +} + static inline int iommu_get_dma_cookie(struct iommu_domain *domain) { return -ENODEV; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init
On Wed, Aug 4, 2021 at 10:16 AM Robin Murphy wrote: > > Factor out flush queue setup from the initial domain init so that we > can potentially trigger it from sysfs later on in a domain's lifetime. > > Reviewed-by: Lu Baolu > Reviewed-by: John Garry > Signed-off-by: Robin Murphy > --- > drivers/iommu/dma-iommu.c | 30 -- > include/linux/dma-iommu.h | 9 ++--- > 2 files changed, 26 insertions(+), 13 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 2e19505dddf9..f51b8dc99ac6 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev) > return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; > } > > +int iommu_dma_init_fq(struct iommu_domain *domain) > +{ > + struct iommu_dma_cookie *cookie = domain->iova_cookie; > + > + if (domain->type != IOMMU_DOMAIN_DMA_FQ) > + return -EINVAL; > + if (cookie->fq_domain) > + return 0; > + > + if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all, > + iommu_dma_entry_dtor)) { > + pr_warn("iova flush queue initialization failed\n"); > + domain->type = IOMMU_DOMAIN_DMA; > + return -ENODEV; > + } > + cookie->fq_domain = domain; > + return 0; > +} > + > /** > * iommu_dma_init_domain - Initialise a DMA mapping domain > * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() > @@ -362,16 +381,7 @@ static int iommu_dma_init_domain(struct iommu_domain > *domain, dma_addr_t base, > } > > init_iova_domain(iovad, 1UL << order, base_pfn); > - > - if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) { > - if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, > - iommu_dma_entry_dtor)) { > - pr_warn("iova flush queue initialization failed\n"); > - domain->type = IOMMU_DOMAIN_DMA; > - } else { > - cookie->fq_domain = domain; > - } > - } > + iommu_dma_init_fq(domain); > > return iova_reserve_iommu_regions(dev, domain); > } > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h > index 758ca4694257..81ab647f1618 100644 > --- a/include/linux/dma-iommu.h > +++ b/include/linux/dma-iommu.h > @@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain); > > /* Setup call for arch DMA mapping code */ > void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); > +int iommu_dma_init_fq(struct iommu_domain *domain); > > /* The DMA API isn't _quite_ the whole story, though... */ > /* > @@ -37,9 +38,6 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, > > void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); > > -void iommu_dma_free_cpu_cached_iovas(unsigned int cpu, > - struct iommu_domain *domain); > - This looks like an unrelated code cleanup. Should this be a separate patch? Thanks, Rajat > extern bool iommu_dma_forcedac; > > #else /* CONFIG_IOMMU_DMA */ > @@ -54,6 +52,11 @@ static inline void iommu_setup_dma_ops(struct device *dev, > u64 dma_base, > { > } > > +static inline int iommu_dma_init_fq(struct iommu_domain *domain) > +{ > + return -EINVAL; > +} > + > static inline int iommu_get_dma_cookie(struct iommu_domain *domain) > { > return -ENODEV; > -- > 2.25.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE
On 2021-08-07 09:41, Sven Peter wrote: Hi, Thanks a lot for quick reply! On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote: On 2021-08-06 16:55, Sven Peter via iommu wrote: DMA IOMMU domains can support hardware where the IOMMU page size is larger than the CPU page size. Alignments need to be done with respect to both PAGE_SIZE and iovad->granule. Additionally, the sg list optimization to use a single IOVA allocation cannot be used in those cases since the physical addresses will very likely not be aligned to the larger IOMMU page size. Signed-off-by: Sven Peter --- drivers/iommu/dma-iommu.c | 87 ++- 1 file changed, 77 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 6f0df629353f..e072d9030d9f 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -8,6 +8,7 @@ * Copyright (C) 2000-2004 Russell King */ +#include #include #include #include @@ -51,6 +52,15 @@ struct iommu_dma_cookie { struct iommu_domain *fq_domain; }; +/* aligns size to CPU and IOMMU page size */ +static inline size_t iommu_page_align(struct device *dev, size_t size) +{ + struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie = domain->iova_cookie; + + return iova_align(>iovad, PAGE_ALIGN(size)); +} + static DEFINE_STATIC_KEY_FALSE(iommu_deferred_attach_enabled); bool iommu_dma_forcedac __read_mostly; @@ -647,6 +657,8 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, /* * If size is less than PAGE_SIZE, then a full CPU page will be allocated, * but an IOMMU which supports smaller pages might not map the whole thing. + * If the IOMMU page size is larger than the CPU page size, then the size + * will be aligned to that granularity and some memory will be left unused. Why do we need to increase the actual memory allocation? The point here is that we allocate the smallest thing we can allocate and map the smallest thing we can map - I think that still works the "wrong" way round too, we should just need to start taking an IOVA offset into account as in dma_map_page() if we can no longer assume it's 0 for a CPU page. Sure we may expose some unrelated adjacent pages, but we'll already be doing that to excess for streaming DMA so whoop de do. I agree for trusted devices, but untrusted ones (Thunderbolt, and depending on your risk tolerance possibly even the broadcom wifi) might also end up calling this. Oh, right, I hadn't considered actual untrusted device support at this stage. For streaming DMA swiotlb will make sure that these won't see memory they're not supposed to access. I was slightly surprised to see that that does appear to work out OK, but I guess SWIOTLB slots are already smaller than typical IOMMU pages, so it falls out of that. Neat. But, at least as far as I understand it, no swiotlb is in the way to catch devices who end up calling this function. That wasn't required because we used to get PAGE_SIZE aligned allocation here and every IOMMU so far would be able to easily map them without any spill overs. But now we'll end up exposing three more unrelated pages if the allocation is not increased. Fair enough, but then why still waste memory in the (usual) cases where it logically *isn't* necessary? */ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev, size_t size, struct sg_table *sgt, gfp_t gfp, pgprot_t prot, @@ -736,7 +748,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size, out_unmap: __iommu_dma_unmap(dev, *dma_handle, size); - __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT); + __iommu_dma_free_pages(pages, iommu_page_align(dev, size) >> PAGE_SHIFT); return NULL; } @@ -766,7 +778,8 @@ static void iommu_dma_free_noncontiguous(struct device *dev, size_t size, struct dma_sgt_handle *sh = sgt_handle(sgt); __iommu_dma_unmap(dev, sgt->sgl->dma_address, size); - __iommu_dma_free_pages(sh->pages, PAGE_ALIGN(size) >> PAGE_SHIFT); + __iommu_dma_free_pages(sh->pages, + iommu_page_align(dev, size) >> PAGE_SHIFT); sg_free_table(>sgt); kfree(sh); } @@ -1006,6 +1019,31 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, if (dev_is_untrusted(dev)) return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs); + /* +* If the IOMMU pagesize is larger than the CPU pagesize we will +* very likely run into sgs with a physical address that is not aligned +* to an IOMMU page boundary. Fall back to just mapping every entry +* independently with __iommu_dma_map then. Scatterlist segments often don't have nicely aligned ends, which is why we already align things to IOVA granules in main loop
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On 2021-08-10 00:00, Rob Clark wrote: On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan wrote: On 2021-08-09 23:37, Rob Clark wrote: > On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan > wrote: >> >> On 2021-08-09 23:10, Will Deacon wrote: >> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote: >> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon wrote: >> >> > >> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote: >> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: >> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: >> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: >> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: >> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: >> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: >> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: >> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote: >> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag") >> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went >> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use >> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will >> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached. >> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU. >> >> > > > > > > > > > > >> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC, >> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC >> >> > > > > > > > > > > and makes GPU the user of this protection flag. >> >> > > > > > > > > > >> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does >> >> > > > > > > > > > not apply anymore? >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then >> >> > > > > > > > > I can repost the patch. >> >> > > > > > > > >> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding >> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That >> >> > > > > > > > can't be right. >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your >> >> > > > > > > concern about mismatched aliases? I mean the cache hierarchy on the >> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and >> >> > > > > > > doesn't really care about the smmu pgtable attributes.. >> >> > > > > > >> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which >> >> > > > > > the device is using then you fall into the "mismatched memory attributes" >> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and >> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but >> >> > > > > > the end result is typically loss of coherency. >> >> > > > > >> >> > > > > Ok, I might have a few other sections to read first to decipher the >> >> > > > > terminology.. >> >> > > > > >> >> > > > > But my understanding of LLC is that it looks just like system memory >> >> > > > > to the CPU and GPU (I think that would make it "the point of >> >> > > > > coherence" between the GPU and CPU?) If that is true, shouldn't it be >> >> > > > > invisible from the point of view of different CPU mapping options? >> >> > > > >> >> > > > You could certainly build a system where mismatched attributes don't cause >> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the >> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I >> >> > > > don't think it helps much. >> >> > > > >> >> > > >> >> > > Hmm, the description of the new mapping flag is that it applies only >> >> > > to transparent outer level cache: >> >> > > >> >> > > +/* >> >> > > + * Non-coherent masters can use this page protection flag to set cacheable >> >> > > + * memory attributes for only a transparent outer level of cache, also known as >> >> > > + * the last-level or system cache. >> >> > > + */ >> >> > > +#define IOMMU_LLC (1 << 6) >> >> > > >> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something >> >> > > like that to make it more clear that it is not necessarily something >> >> > > that would work with a different outer level cache implementation? >> >> > >> >> > ... or we could just deal with the problem so that other people can reuse >> >> > the code. I haven't really understood the reluctance to solve this properly. >> >> > >> >> > Am I missing some
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan wrote: > > On 2021-08-09 23:37, Rob Clark wrote: > > On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan > > wrote: > >> > >> On 2021-08-09 23:10, Will Deacon wrote: > >> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote: > >> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon wrote: > >> >> > > >> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote: > >> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: > >> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: > >> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon > >> >> > > > > wrote: > >> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > >> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon > >> >> > > > > > > wrote: > >> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash > >> >> > > > > > > > Ranjan wrote: > >> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > >> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash > >> >> > > > > > > > > > Ranjan wrote: > >> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused > >> >> > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag") > >> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and > >> >> > > > > > > > > > > along with it went > >> >> > > > > > > > > > > the memory type setting required for the > >> >> > > > > > > > > > > non-coherent masters to use > >> >> > > > > > > > > > > system cache. Now that system cache support for GPU > >> >> > > > > > > > > > > is added, we will > >> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers > >> >> > > > > > > > > > > to be sys cached. > >> >> > > > > > > > > > > Without this, the system cache lines are not > >> >> > > > > > > > > > > allocated for GPU. > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > So the patches in this series introduces a new prot > >> >> > > > > > > > > > > flag IOMMU_LLC, > >> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to > >> >> > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC > >> >> > > > > > > > > > > and makes GPU the user of this protection flag. > >> >> > > > > > > > > > > >> >> > > > > > > > > > Thank you for the patchset! Are you planning to > >> >> > > > > > > > > > refresh it, as it does > >> >> > > > > > > > > > not apply anymore? > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no > >> >> > > > > > > > > changes needed, then > >> >> > > > > > > > > I can repost the patch. > >> >> > > > > > > > > >> >> > > > > > > > I still think you need to handle the mismatched alias, > >> >> > > > > > > > no? You're adding > >> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the > >> >> > > > > > > > CPU side. That > >> >> > > > > > > > can't be right. > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > Just curious, and maybe this is a dumb question, but what > >> >> > > > > > > is your > >> >> > > > > > > concern about mismatched aliases? I mean the cache > >> >> > > > > > > hierarchy on the > >> >> > > > > > > GPU device side (anything beyond the LLC) is pretty > >> >> > > > > > > different and > >> >> > > > > > > doesn't really care about the smmu pgtable attributes.. > >> >> > > > > > > >> >> > > > > > If the CPU accesses a shared buffer with different attributes > >> >> > > > > > to those which > >> >> > > > > > the device is using then you fall into the "mismatched memory > >> >> > > > > > attributes" > >> >> > > > > > part of the Arm architecture. It's reasonably unforgiving > >> >> > > > > > (you should go and > >> >> > > > > > read it) and in some cases can apply to speculative accesses > >> >> > > > > > as well, but > >> >> > > > > > the end result is typically loss of coherency. > >> >> > > > > > >> >> > > > > Ok, I might have a few other sections to read first to decipher > >> >> > > > > the > >> >> > > > > terminology.. > >> >> > > > > > >> >> > > > > But my understanding of LLC is that it looks just like system > >> >> > > > > memory > >> >> > > > > to the CPU and GPU (I think that would make it "the point of > >> >> > > > > coherence" between the GPU and CPU?) If that is true, > >> >> > > > > shouldn't it be > >> >> > > > > invisible from the point of view of different CPU mapping > >> >> > > > > options? > >> >> > > > > >> >> > > > You could certainly build a system where mismatched attributes > >> >> > > > don't cause > >> >> > > > loss of coherence, but as it's not guaranteed by the architecture > >> >> > > > and the > >> >> > > > changes proposed here affect APIs which are exposed across SoCs, > >> >> > > > then I > >> >> > > > don't think it helps much. > >> >> > > > > >> >> > > > >> >> > > Hmm, the description of the new mapping flag is that it applies only > >> >> > > to transparent outer level cache: > >> >> > > > >> >> > > +/* >
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On 2021-08-09 23:37, Rob Clark wrote: On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan wrote: On 2021-08-09 23:10, Will Deacon wrote: > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote: >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon wrote: >> > >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote: >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote: >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag") >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached. >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU. >> > > > > > > > > > > >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC, >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC >> > > > > > > > > > > and makes GPU the user of this protection flag. >> > > > > > > > > > >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does >> > > > > > > > > > not apply anymore? >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then >> > > > > > > > > I can repost the patch. >> > > > > > > > >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That >> > > > > > > > can't be right. >> > > > > > > > >> > > > > > > >> > > > > > > Just curious, and maybe this is a dumb question, but what is your >> > > > > > > concern about mismatched aliases? I mean the cache hierarchy on the >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and >> > > > > > > doesn't really care about the smmu pgtable attributes.. >> > > > > > >> > > > > > If the CPU accesses a shared buffer with different attributes to those which >> > > > > > the device is using then you fall into the "mismatched memory attributes" >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and >> > > > > > read it) and in some cases can apply to speculative accesses as well, but >> > > > > > the end result is typically loss of coherency. >> > > > > >> > > > > Ok, I might have a few other sections to read first to decipher the >> > > > > terminology.. >> > > > > >> > > > > But my understanding of LLC is that it looks just like system memory >> > > > > to the CPU and GPU (I think that would make it "the point of >> > > > > coherence" between the GPU and CPU?) If that is true, shouldn't it be >> > > > > invisible from the point of view of different CPU mapping options? >> > > > >> > > > You could certainly build a system where mismatched attributes don't cause >> > > > loss of coherence, but as it's not guaranteed by the architecture and the >> > > > changes proposed here affect APIs which are exposed across SoCs, then I >> > > > don't think it helps much. >> > > > >> > > >> > > Hmm, the description of the new mapping flag is that it applies only >> > > to transparent outer level cache: >> > > >> > > +/* >> > > + * Non-coherent masters can use this page protection flag to set cacheable >> > > + * memory attributes for only a transparent outer level of cache, also known as >> > > + * the last-level or system cache. >> > > + */ >> > > +#define IOMMU_LLC (1 << 6) >> > > >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something >> > > like that to make it more clear that it is not necessarily something >> > > that would work with a different outer level cache implementation? >> > >> > ... or we could just deal with the problem so that other people can reuse >> > the code. I haven't really understood the reluctance to solve this properly. >> > >> > Am I missing some reason this isn't solvable? >> >> Oh, was there another way to solve it (other than foregoing setting >> INC_OCACHE in the pgtables)? Maybe I misunderstood, is there a >> corresponding setting on the MMU pgtables side of things? > > Right -- we just need to program the CPU's MMU with the matching memory > attributes! It's a bit more fiddly if you're
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan wrote: > > On 2021-08-09 23:10, Will Deacon wrote: > > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote: > >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon wrote: > >> > > >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote: > >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: > >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: > >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: > >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon > >> > > > > > > wrote: > >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan > >> > > > > > > > wrote: > >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash > >> > > > > > > > > > Ranjan wrote: > >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused > >> > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag") > >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and > >> > > > > > > > > > > along with it went > >> > > > > > > > > > > the memory type setting required for the non-coherent > >> > > > > > > > > > > masters to use > >> > > > > > > > > > > system cache. Now that system cache support for GPU is > >> > > > > > > > > > > added, we will > >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to > >> > > > > > > > > > > be sys cached. > >> > > > > > > > > > > Without this, the system cache lines are not allocated > >> > > > > > > > > > > for GPU. > >> > > > > > > > > > > > >> > > > > > > > > > > So the patches in this series introduces a new prot > >> > > > > > > > > > > flag IOMMU_LLC, > >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to > >> > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC > >> > > > > > > > > > > and makes GPU the user of this protection flag. > >> > > > > > > > > > > >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh > >> > > > > > > > > > it, as it does > >> > > > > > > > > > not apply anymore? > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes > >> > > > > > > > > needed, then > >> > > > > > > > > I can repost the patch. > >> > > > > > > > > >> > > > > > > > I still think you need to handle the mismatched alias, no? > >> > > > > > > > You're adding > >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU > >> > > > > > > > side. That > >> > > > > > > > can't be right. > >> > > > > > > > > >> > > > > > > > >> > > > > > > Just curious, and maybe this is a dumb question, but what is > >> > > > > > > your > >> > > > > > > concern about mismatched aliases? I mean the cache hierarchy > >> > > > > > > on the > >> > > > > > > GPU device side (anything beyond the LLC) is pretty different > >> > > > > > > and > >> > > > > > > doesn't really care about the smmu pgtable attributes.. > >> > > > > > > >> > > > > > If the CPU accesses a shared buffer with different attributes to > >> > > > > > those which > >> > > > > > the device is using then you fall into the "mismatched memory > >> > > > > > attributes" > >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you > >> > > > > > should go and > >> > > > > > read it) and in some cases can apply to speculative accesses as > >> > > > > > well, but > >> > > > > > the end result is typically loss of coherency. > >> > > > > > >> > > > > Ok, I might have a few other sections to read first to decipher the > >> > > > > terminology.. > >> > > > > > >> > > > > But my understanding of LLC is that it looks just like system > >> > > > > memory > >> > > > > to the CPU and GPU (I think that would make it "the point of > >> > > > > coherence" between the GPU and CPU?) If that is true, shouldn't > >> > > > > it be > >> > > > > invisible from the point of view of different CPU mapping options? > >> > > > > >> > > > You could certainly build a system where mismatched attributes don't > >> > > > cause > >> > > > loss of coherence, but as it's not guaranteed by the architecture > >> > > > and the > >> > > > changes proposed here affect APIs which are exposed across SoCs, > >> > > > then I > >> > > > don't think it helps much. > >> > > > > >> > > > >> > > Hmm, the description of the new mapping flag is that it applies only > >> > > to transparent outer level cache: > >> > > > >> > > +/* > >> > > + * Non-coherent masters can use this page protection flag to set > >> > > cacheable > >> > > + * memory attributes for only a transparent outer level of cache, > >> > > also known as > >> > > + * the last-level or system cache. > >> > > + */ > >> > > +#define IOMMU_LLC (1 << 6) > >> > > > >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something > >> > > like that to make it more clear that it is not
[PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() still need to handle. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V DMA ops callback will use swiotlb function to allocate bounce buffer and copy data from/to bounce buffer. Signed-off-by: Tianyu Lan --- drivers/net/hyperv/hyperv_net.h | 6 ++ drivers/net/hyperv/netvsc.c | 144 +- drivers/net/hyperv/rndis_filter.c | 2 + include/linux/hyperv.h| 5 ++ 4 files changed, 154 insertions(+), 3 deletions(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index bc48855dff10..862419912bfb 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ u32 recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,8 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; + u32 send_buf_size; u32 send_buf_gpadl_handle; u32 send_section_cnt; u32 send_section_size; @@ -1730,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 7bd935412853..fc312e5db4d5 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head) int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_original_buf) { + vunmap(nvdev->recv_buf); + vfree(nvdev->recv_original_buf); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_original_buf) { + vunmap(nvdev->send_buf); + vfree(nvdev->send_original_buf); + } else { + vfree(nvdev->send_buf); + } + kfree(nvdev->send_section_map); for (i = 0; i < VRSS_CHANNEL_MAX; i++) { @@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx) return nvchan->mrc.slots ? 0 : -ENOMEM; } +static void *netvsc_remap_buf(void *buf, unsigned long size) +{ + unsigned long *pfns; + void *vaddr; + int i; + + pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long), + GFP_KERNEL); + if (!pfns) + return NULL; + + for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++) + pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + static int netvsc_init_buf(struct hv_device *device, struct netvsc_device *net_device, const struct netvsc_device_info *device_info) @@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device, unsigned int buf_size; size_t map_words; int i, ret = 0; + void *vaddr; /* Get receive buffer area. */ buf_size = device_info->recv_sections * device_info->recv_section_size; @@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device, goto cleanup; } + if (hv_isolation_type_snp()) { + vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size); + if (!vaddr) + goto cleanup; + + net_device->recv_original_buf = net_device->recv_buf; + net_device->recv_buf = vaddr; + } + /* Notify the NetVsp of the gpadl handle */ init_packet = _device->channel_init_pkt; memset(init_packet, 0, sizeof(struct nvsp_message)); @@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device, goto cleanup; } + if (hv_isolation_type_snp()) { + vaddr = netvsc_remap_buf(net_device->send_buf, buf_size); + if (!vaddr) + goto cleanup; + + net_device->send_original_buf =
[PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still need to handle. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V DMA ops callback will use swiotlb function to allocate bounce buffer and copy data from/to bounce buffer. Signed-off-by: Tianyu Lan --- drivers/scsi/storvsc_drv.c | 68 +++--- 1 file changed, 63 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 328bb961c281..78320719bdd8 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include +#include #include #include #include @@ -427,6 +429,8 @@ struct storvsc_cmd_request { u32 payload_sz; struct vstor_packet vstor_packet; + u32 hvpg_count; + struct hv_dma_range *dma_range; }; @@ -509,6 +513,14 @@ struct storvsc_scan_work { u8 tgt_id; }; +#define storvsc_dma_map(dev, page, offset, size, dir) \ + dma_map_page(dev, page, offset, size, dir) + +#define storvsc_dma_unmap(dev, dma_range, dir) \ + dma_unmap_page(dev, dma_range.dma, \ + dma_range.mapping_size, \ + dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE) + static void storvsc_device_scan(struct work_struct *work) { struct storvsc_scan_work *wrk; @@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context) struct hv_device *device; struct storvsc_device *stor_device; struct Scsi_Host *shost; + int i; if (channel->primary_channel != NULL) device = channel->primary_channel->device_obj; @@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context) request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); } + if (request->dma_range) { + for (i = 0; i < request->hvpg_count; i++) + storvsc_dma_unmap(>device, + request->dma_range[i], + request->vstor_packet.vm_srb.data_in == READ_TYPE); + + kfree(request->dma_range); + } + storvsc_on_receive(stor_device, packet, request); continue; } @@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); + dma_addr_t dma; u64 hvpfn; + u32 size; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + cmd_request->dma_range = kcalloc(hvpg_count, +sizeof(*cmd_request->dma_range), +GFP_ATOMIC); + if (!cmd_request->dma_range) { + ret = -ENOMEM; + goto free_payload; + } for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { /* @@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) * last sgl should be reached at the same time that * the PFN array is filled. */ - while (hvpfns_to_add--) - payload->range.pfn_array[i++] = hvpfn++; + while (hvpfns_to_add--) { + size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg, + (unsigned long)length); + dma = storvsc_dma_map(>device, pfn_to_page(hvpfn++), + offset_in_hvpg, size, + scmnd->sc_data_direction); + if (dma_mapping_error(>device, dma)) { + ret = -ENOMEM; + goto free_dma_range; + } + + if (offset_in_hvpg) { + payload->range.offset = dma & ~HV_HYP_PAGE_MASK; +
[PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan Hyper-V Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Swiotlb bounce buffer code calls dma_map_decrypted() to mark bounce buffer visible to host and map it in extra address space. Populate dma memory decrypted ops with hv map/unmap function. Hyper-V initalizes swiotlb bounce buffer and default swiotlb needs to be disabled. pci_swiotlb_detect_override() and pci_swiotlb_detect_4gb() enable the default one. To override the setting, hyperv_swiotlb_detect() needs to run before these detect functions which depends on the pci_xen_swiotlb_ init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb _detect() to keep the order. The map function vmap_pfn() can't work in the early place hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce buffer in the hyperv_iommu_swiotlb_later_init(). Signed-off-by: Tianyu Lan --- arch/x86/hyperv/ivm.c | 28 ++ arch/x86/include/asm/mshyperv.h | 2 + arch/x86/xen/pci-swiotlb-xen.c | 3 +- drivers/hv/vmbus_drv.c | 3 ++ drivers/iommu/hyperv-iommu.c| 65 + include/linux/hyperv.h | 1 + 6 files changed, 101 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index c13ec5560d73..0f05e4d6fc62 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -265,3 +265,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible) return __hv_set_mem_host_visibility((void *)addr, numpages, visibility); } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++) + pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index a30c60f189a3..b247739f57ac 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -250,6 +250,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry); int hv_mark_gpa_visibility(u16 count, const u64 pfn[], enum hv_mem_host_visibility visibility); int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible); +void *hv_map_memory(void *addr, unsigned long size); +void hv_unmap_memory(void *addr); void hv_sint_wrmsrl_ghcb(u64 msr, u64 value); void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value); void hv_signal_eom_ghcb(void); diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 54f9aa7e8457..43bd031aa332 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void) EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late); IOMMU_INIT_FINISH(pci_xen_swiotlb_detect, - NULL, + hyperv_swiotlb_detect, pci_xen_swiotlb_init, NULL); diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 57bbbaa4e8f7..f068e22a5636 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -23,6 +23,7 @@ #include #include +#include #include #include #include @@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = _dma_mask; return 0; err_kset_unregister: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e285a220c913..01e874b3b43a 100644 ---
[PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Use dma_map_decrypted() in the swiotlb code, store remap address returned and use the remap address to copy data from/to swiotlb bounce buffer. Signed-off-by: Tianyu Lan --- Change since v1: * Make swiotlb_init_io_tlb_mem() return error code and return error when dma_map_decrypted() fails. --- include/linux/swiotlb.h | 4 kernel/dma/swiotlb.c| 32 2 files changed, 28 insertions(+), 8 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index f507e3eacbea..584560ecaa8e 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb + * memory pool may be remapped in the memory encrypted case and store + * virtual address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 1fa81c096c1d..29b6d888ef3b 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -176,7 +176,7 @@ void __init swiotlb_update_mem_attributes(void) memset(vaddr, 0, bytes); } -static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, unsigned long nslabs, bool late_alloc) { void *vaddr = phys_to_virt(start); @@ -194,14 +194,21 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].alloc_size = 0; } - set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + mem->vaddr = dma_map_decrypted(vaddr, bytes); + if (!mem->vaddr) { + pr_err("Failed to decrypt memory.\n"); + return -ENOMEM; + } + + memset(mem->vaddr, 0, bytes); + return 0; } int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) { struct io_tlb_mem *mem; size_t alloc_size; + int ret; if (swiotlb_force == SWIOTLB_NO_FORCE) return 0; @@ -216,7 +223,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) panic("%s: Failed to allocate %zu bytes align=0x%lx\n", __func__, alloc_size, PAGE_SIZE); - swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); + ret = swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); + if (ret) { + memblock_free(__pa(mem), alloc_size); + return ret; + } io_tlb_default_mem = mem; if (verbose) @@ -304,6 +315,8 @@ int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) { struct io_tlb_mem *mem; + int size = get_order(struct_size(mem, slots, nslabs)); + int ret; if (swiotlb_force == SWIOTLB_NO_FORCE) return 0; @@ -312,12 +325,15 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) if (WARN_ON_ONCE(io_tlb_default_mem)) return -ENOMEM; - mem = (void *)__get_free_pages(GFP_KERNEL, - get_order(struct_size(mem, slots, nslabs))); + mem = (void *)__get_free_pages(GFP_KERNEL, size); if (!mem) return -ENOMEM; - swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true); + ret = swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true); + if (ret) { + free_pages((unsigned long)mem, size); + return ret; + } io_tlb_default_mem = mem; swiotlb_print_info(); @@ -360,7 +376,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size phys_addr_t orig_addr = mem->slots[index].orig_addr; size_t alloc_size =
[PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
From: Tianyu Lan In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer needs to be mapped into address space above vTOM and so introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap bounce buffer memory. The platform can populate man/unmap callback in the dma memory decrypted ops. --- include/linux/dma-map-ops.h | 9 + kernel/dma/mapping.c| 22 ++ 2 files changed, 31 insertions(+) diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 0d53a96a3d64..01d60a024e45 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -71,6 +71,11 @@ struct dma_map_ops { unsigned long (*get_merge_boundary)(struct device *dev); }; +struct dma_memory_decrypted_ops { + void *(*map)(void *addr, unsigned long size); + void (*unmap)(void *addr); +}; + #ifdef CONFIG_DMA_OPS #include @@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device *dev) } #endif /* CONFIG_DMA_API_DEBUG */ +void *dma_map_decrypted(void *addr, unsigned long size); +int dma_unmap_decrypted(void *addr, unsigned long size); + extern const struct dma_map_ops dma_dummy_ops; +extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops; #endif /* _LINUX_DMA_MAP_OPS_H */ diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 2b06a809d0b9..6fb150dc1750 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -13,11 +13,13 @@ #include #include #include +#include #include "debug.h" #include "direct.h" bool dma_default_coherent; +struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops; /* * Managed DMA API */ @@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev) return ops->get_merge_boundary(dev); } EXPORT_SYMBOL_GPL(dma_get_merge_boundary); + +void *dma_map_decrypted(void *addr, unsigned long size) +{ + if (set_memory_decrypted((unsigned long)addr, +size / PAGE_SIZE)) + return NULL; + + if (dma_memory_generic_decrypted_ops.map) + return dma_memory_generic_decrypted_ops.map(addr, size); + else + return addr; +} + +int dma_unmap_encrypted(void *addr, unsigned long size) +{ + if (dma_memory_generic_decrypted_ops.unmap) + dma_memory_generic_decrypted_ops.unmap(addr); + + return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE); +} -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
From: Tianyu Lan VMbus ring buffer are shared with host and it's need to be accessed via extra address space of Isolation VM with SNP support. This patch is to map the ring buffer address in extra address space via ioremap(). HV host visibility hvcall smears data in the ring buffer and so reset the ring buffer memory to zero after calling visibility hvcall. Signed-off-by: Tianyu Lan --- drivers/hv/Kconfig| 1 + drivers/hv/channel.c | 10 + drivers/hv/hyperv_vmbus.h | 2 + drivers/hv/ring_buffer.c | 84 ++- 4 files changed, 79 insertions(+), 18 deletions(-) diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig index d1123ceb38f3..dd12af20e467 100644 --- a/drivers/hv/Kconfig +++ b/drivers/hv/Kconfig @@ -8,6 +8,7 @@ config HYPERV || (ARM64 && !CPU_BIG_ENDIAN)) select PARAVIRT select X86_HV_CALLBACK_VECTOR if X86 + select VMAP_PFN help Select this option to run Linux as a Hyper-V client operating system. diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index 4c4717c26240..60ef881a700c 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -712,6 +712,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel, if (err) goto error_clean_ring; + err = hv_ringbuffer_post_init(>outbound, + page, send_pages); + if (err) + goto error_free_gpadl; + + err = hv_ringbuffer_post_init(>inbound, + [send_pages], recv_pages); + if (err) + goto error_free_gpadl; + /* Create and init the channel open message */ open_info = kzalloc(sizeof(*open_info) + sizeof(struct vmbus_channel_open_channel), diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h index 40bc0eff6665..15cd23a561f3 100644 --- a/drivers/hv/hyperv_vmbus.h +++ b/drivers/hv/hyperv_vmbus.h @@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu); /* Interface */ void hv_ringbuffer_pre_init(struct vmbus_channel *channel); +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info, + struct page *pages, u32 page_cnt); int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, struct page *pages, u32 pagecnt, u32 max_pkt_size); diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c index 2aee356840a2..d4f93fca1108 100644 --- a/drivers/hv/ring_buffer.c +++ b/drivers/hv/ring_buffer.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "hyperv_vmbus.h" @@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel) mutex_init(>outbound.ring_buffer_mutex); } -/* Initialize the ring buffer. */ -int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, - struct page *pages, u32 page_cnt, u32 max_pkt_size) +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info, + struct page *pages, u32 page_cnt) { + u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT; + unsigned long *pfns_wraparound; + void *vaddr; int i; - struct page **pages_wraparound; - BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE)); + if (!hv_isolation_type_snp()) + return 0; + + physic_addr += ms_hyperv.shared_gpa_boundary; /* * First page holds struct hv_ring_buffer, do wraparound mapping for * the rest. */ - pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *), + pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long), GFP_KERNEL); - if (!pages_wraparound) + if (!pfns_wraparound) return -ENOMEM; - pages_wraparound[0] = pages; + pfns_wraparound[0] = physic_addr >> PAGE_SHIFT; for (i = 0; i < 2 * (page_cnt - 1); i++) - pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1]; - - ring_info->ring_buffer = (struct hv_ring_buffer *) - vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL); - - kfree(pages_wraparound); + pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) + + i % (page_cnt - 1) + 1; - - if (!ring_info->ring_buffer) + vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO); + kfree(pfns_wraparound); + if (!vaddr) return -ENOMEM; - ring_info->ring_buffer->read_index = - ring_info->ring_buffer->write_index = 0; + /* Clean memory after setting host visibility. */ + memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE); + + ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr; + ring_info->ring_buffer->read_index = 0; + ring_info->ring_buffer->write_index = 0;
[PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message
From: Tianyu Lan The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared with host in Isolation VM and so it's necessary to use hvcall to set them visible to host. In Isolation VM with AMD SEV SNP, the access address should be in the extra space which is above shared gpa boundary. So remap these pages into the extra address(pa + shared_gpa_boundary). Introduce monitor_pages_va to store the remap address and unmap these va when disconnect vmbus. Signed-off-by: Tianyu Lan --- Change since v1: * Not remap monitor pages in the non-SNP isolation VM. --- drivers/hv/connection.c | 65 +++ drivers/hv/hyperv_vmbus.h | 1 + 2 files changed, 66 insertions(+) diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index 6d315c1465e0..bf0ac3167bd2 100644 --- a/drivers/hv/connection.c +++ b/drivers/hv/connection.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version) msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]); msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]); + + if (hv_isolation_type_snp()) { + msg->monitor_page1 += ms_hyperv.shared_gpa_boundary; + msg->monitor_page2 += ms_hyperv.shared_gpa_boundary; + } + msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU); /* @@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version) return -ECONNREFUSED; } + if (hv_isolation_type_snp()) { + vmbus_connection.monitor_pages_va[0] + = vmbus_connection.monitor_pages[0]; + vmbus_connection.monitor_pages[0] + = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE, + MEMREMAP_WB); + if (!vmbus_connection.monitor_pages[0]) + return -ENOMEM; + + vmbus_connection.monitor_pages_va[1] + = vmbus_connection.monitor_pages[1]; + vmbus_connection.monitor_pages[1] + = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE, + MEMREMAP_WB); + if (!vmbus_connection.monitor_pages[1]) { + memunmap(vmbus_connection.monitor_pages[0]); + return -ENOMEM; + } + + memset(vmbus_connection.monitor_pages[0], 0x00, + HV_HYP_PAGE_SIZE); + memset(vmbus_connection.monitor_pages[1], 0x00, + HV_HYP_PAGE_SIZE); + } + return ret; } @@ -159,6 +191,7 @@ int vmbus_connect(void) struct vmbus_channel_msginfo *msginfo = NULL; int i, ret = 0; __u32 version; + u64 pfn[2]; /* Initialize the vmbus connection */ vmbus_connection.conn_state = CONNECTING; @@ -216,6 +249,16 @@ int vmbus_connect(void) goto cleanup; } + if (hv_is_isolation_supported()) { + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]); + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]); + if (hv_mark_gpa_visibility(2, pfn, + VMBUS_PAGE_VISIBLE_READ_WRITE)) { + ret = -EFAULT; + goto cleanup; + } + } + msginfo = kzalloc(sizeof(*msginfo) + sizeof(struct vmbus_channel_initiate_contact), GFP_KERNEL); @@ -284,6 +327,8 @@ int vmbus_connect(void) void vmbus_disconnect(void) { + u64 pfn[2]; + /* * First send the unload request to the host. */ @@ -303,6 +348,26 @@ void vmbus_disconnect(void) vmbus_connection.int_page = NULL; } + if (hv_is_isolation_supported()) { + if (vmbus_connection.monitor_pages_va[0]) { + memunmap(vmbus_connection.monitor_pages[0]); + vmbus_connection.monitor_pages[0] + = vmbus_connection.monitor_pages_va[0]; + vmbus_connection.monitor_pages_va[0] = NULL; + } + + if (vmbus_connection.monitor_pages_va[1]) { + memunmap(vmbus_connection.monitor_pages[1]); + vmbus_connection.monitor_pages[1] + = vmbus_connection.monitor_pages_va[1]; + vmbus_connection.monitor_pages_va[1] = NULL; + } + + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]); + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]); + hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE); +
[PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM
From: Tianyu Lan Hyper-V provides ghcb hvcall to handle VMBus HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE msg in SNP Isolation VM. Add such support. Signed-off-by: Tianyu Lan --- arch/x86/hyperv/ivm.c | 43 + arch/x86/include/asm/mshyperv.h | 1 + drivers/hv/connection.c | 6 - drivers/hv/hv.c | 8 +- include/asm-generic/mshyperv.h | 29 ++ 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index ec0e5c259740..c13ec5560d73 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -15,6 +15,49 @@ #include #include +#define GHCB_USAGE_HYPERV_CALL 1 + +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size) +{ + union hv_ghcb *hv_ghcb; + void **ghcb_base; + unsigned long flags; + + if (!ms_hyperv.ghcb_base) + return -EFAULT; + + WARN_ON(in_nmi()); + + local_irq_save(flags); + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); + hv_ghcb = (union hv_ghcb *)*ghcb_base; + if (!hv_ghcb) { + local_irq_restore(flags); + return -EFAULT; + } + + hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX; + hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL; + + hv_ghcb->hypercall.outputgpa = (u64)output; + hv_ghcb->hypercall.hypercallinput.asuint64 = 0; + hv_ghcb->hypercall.hypercallinput.callcode = control; + + if (input_size) + memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size); + + VMGEXIT(); + + hv_ghcb->ghcb.ghcb_usage = 0x; + memset(hv_ghcb->ghcb.save.valid_bitmap, 0, + sizeof(hv_ghcb->ghcb.save.valid_bitmap)); + + local_irq_restore(flags); + + return hv_ghcb->hypercall.hypercalloutput.callstatus; +} +EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + void hv_ghcb_msr_write(u64 msr, u64 value) { union hv_ghcb *hv_ghcb; diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index 730985676ea3..a30c60f189a3 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -255,6 +255,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value); void hv_signal_eom_ghcb(void); void hv_ghcb_msr_write(u64 msr, u64 value); void hv_ghcb_msr_read(u64 msr, u64 *value); +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size); #define hv_get_synint_state_ghcb(int_num, val) \ hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val) diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index 5e479d54918c..6d315c1465e0 100644 --- a/drivers/hv/connection.c +++ b/drivers/hv/connection.c @@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel) ++channel->sig_events; - hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event); + if (hv_isolation_type_snp()) + hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event, + NULL, sizeof(u64)); + else + hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event); } EXPORT_SYMBOL_GPL(vmbus_set_event); diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c index 59f7173c4d9f..e5c9fc467893 100644 --- a/drivers/hv/hv.c +++ b/drivers/hv/hv.c @@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id, aligned_msg->payload_size = payload_size; memcpy((void *)aligned_msg->payload, payload, payload_size); - status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL); + if (hv_isolation_type_snp()) + status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE, + (void *)aligned_msg, NULL, + sizeof(struct hv_input_post_message)); + else + status = hv_do_hypercall(HVCALL_POST_MESSAGE, + aligned_msg, NULL); /* Preemption must remain disabled until after the hypercall * so some other thread can't get scheduled onto this cpu and diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index 90dac369a2dc..400181b855c1 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -31,6 +31,35 @@ union hv_ghcb { struct ghcb ghcb; + struct { + u64 hypercalldata[509]; + u64 outputgpa; + union { + union { + struct { + u32 callcode: 16; + u32 isfast : 1; + u32 reserved1 : 14; + u32 isnested: 1; + u32 countofelements : 12; + u32 reserved2
[PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
From: Tianyu Lan Hyper-V provides GHCB protocol to write Synthetic Interrupt Controller MSR registers in Isolation VM with AMD SEV SNP and these registers are emulated by hypervisor directly. Hyper-V requires to write SINTx MSR registers twice. First writes MSR via GHCB page to communicate with hypervisor and then writes wrmsr instruction to talk with paravisor which runs in VMPL0. Guest OS ID MSR also needs to be set via GHCB. Signed-off-by: Tianyu Lan --- Change since v1: * Introduce sev_es_ghcb_hv_call_simple() and share code between SEV and Hyper-V code. --- arch/x86/hyperv/hv_init.c | 33 ++--- arch/x86/hyperv/ivm.c | 110 + arch/x86/include/asm/mshyperv.h | 78 +++- arch/x86/include/asm/sev.h | 3 + arch/x86/kernel/cpu/mshyperv.c | 3 + arch/x86/kernel/sev-shared.c| 63 ++--- drivers/hv/hv.c | 121 ++-- include/asm-generic/mshyperv.h | 12 +++- 8 files changed, 329 insertions(+), 94 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index b3683083208a..ab0b33f621e7 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -423,7 +423,7 @@ void __init hyperv_init(void) goto clean_guest_os_id; if (hv_isolation_type_snp()) { - ms_hyperv.ghcb_base = alloc_percpu(void *); + ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *); if (!ms_hyperv.ghcb_base) goto clean_guest_os_id; @@ -432,6 +432,9 @@ void __init hyperv_init(void) ms_hyperv.ghcb_base = NULL; goto clean_guest_os_id; } + + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */ + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id); } rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); @@ -523,6 +526,7 @@ void hyperv_cleanup(void) /* Reset our OS id */ wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0); /* * Reset hypercall page reference before reset the page, @@ -596,30 +600,3 @@ bool hv_is_hyperv_initialized(void) return hypercall_msr.enable; } EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized); - -enum hv_isolation_type hv_get_isolation_type(void) -{ - if (!(ms_hyperv.priv_high & HV_ISOLATION)) - return HV_ISOLATION_TYPE_NONE; - return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b); -} -EXPORT_SYMBOL_GPL(hv_get_isolation_type); - -bool hv_is_isolation_supported(void) -{ - if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) - return 0; - - if (!hypervisor_is_type(X86_HYPER_MS_HYPERV)) - return 0; - - return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE; -} - -DEFINE_STATIC_KEY_FALSE(isolation_type_snp); - -bool hv_isolation_type_snp(void) -{ - return static_branch_unlikely(_type_snp); -} -EXPORT_SYMBOL_GPL(hv_isolation_type_snp); diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 8c905ffdba7f..ec0e5c259740 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -6,6 +6,8 @@ * Tianyu Lan */ +#include +#include #include #include #include @@ -13,6 +15,114 @@ #include #include +void hv_ghcb_msr_write(u64 msr, u64 value) +{ + union hv_ghcb *hv_ghcb; + void **ghcb_base; + unsigned long flags; + + if (!ms_hyperv.ghcb_base) + return; + + WARN_ON(in_nmi()); + + local_irq_save(flags); + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); + hv_ghcb = (union hv_ghcb *)*ghcb_base; + if (!hv_ghcb) { + local_irq_restore(flags); + return; + } + + ghcb_set_rcx(_ghcb->ghcb, msr); + ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value)); + ghcb_set_rdx(_ghcb->ghcb, value >> 32); + + if (sev_es_ghcb_hv_call_simple(_ghcb->ghcb, SVM_EXIT_MSR, 1, 0)) + pr_warn("Fail to write msr via ghcb %llx.\n", msr); + + local_irq_restore(flags); +} + +void hv_ghcb_msr_read(u64 msr, u64 *value) +{ + union hv_ghcb *hv_ghcb; + void **ghcb_base; + unsigned long flags; + + if (!ms_hyperv.ghcb_base) + return; + + WARN_ON(in_nmi()); + + local_irq_save(flags); + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); + hv_ghcb = (union hv_ghcb *)*ghcb_base; + if (!hv_ghcb) { + local_irq_restore(flags); + return; + } + + ghcb_set_rcx(_ghcb->ghcb, msr); + if (sev_es_ghcb_hv_call_simple(_ghcb->ghcb, SVM_EXIT_MSR, 0, 0)) + pr_warn("Fail to read msr via ghcb %llx.\n", msr); + else + *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax) + |
[PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
From: Tianyu Lan Mark vmbus ring buffer visible with set_memory_decrypted() when establish gpadl handle. Signed-off-by: Tianyu Lan --- drivers/hv/channel.c | 44 -- include/linux/hyperv.h | 11 +++ 2 files changed, 53 insertions(+), 2 deletions(-) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index f3761c73b074..4c4717c26240 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -465,7 +466,14 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel, struct list_head *curr; u32 next_gpadl_handle; unsigned long flags; - int ret = 0; + int ret = 0, index; + + index = atomic_inc_return(>gpadl_index) - 1; + + if (index > VMBUS_GPADL_RANGE_COUNT - 1) { + pr_err("Gpadl handle position(%d) has been occupied.\n", index); + return -ENOSPC; + } next_gpadl_handle = (atomic_inc_return(_connection.next_gpadl_handle) - 1); @@ -474,6 +482,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel, if (ret) return ret; + ret = set_memory_decrypted((unsigned long)kbuffer, + HVPFN_UP(size)); + if (ret) { + pr_warn("Failed to set host visibility.\n"); + return ret; + } + init_completion(>waitevent); msginfo->waiting_channel = channel; @@ -539,6 +554,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel, /* At this point, we received the gpadl created msg */ *gpadl_handle = gpadlmsg->gpadl; + channel->gpadl_array[index].size = size; + channel->gpadl_array[index].buffer = kbuffer; + channel->gpadl_array[index].gpadlhandle = *gpadl_handle; + cleanup: spin_lock_irqsave(_connection.channelmsg_lock, flags); list_del(>msglistentry); @@ -549,6 +568,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel, } kfree(msginfo); + + if (ret) { + set_memory_encrypted((unsigned long)kbuffer, +HVPFN_UP(size)); + atomic_dec(>gpadl_index); + } + return ret; } @@ -676,6 +702,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel, /* Establish the gpadl for the ring buffer */ newchannel->ringbuffer_gpadlhandle = 0; + atomic_set(>gpadl_index, 0); err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING, page_address(newchannel->ringbuffer_page), @@ -811,7 +838,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle) struct vmbus_channel_gpadl_teardown *msg; struct vmbus_channel_msginfo *info; unsigned long flags; - int ret; + int ret, i; info = kzalloc(sizeof(*info) + sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL); @@ -859,6 +886,19 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle) spin_unlock_irqrestore(_connection.channelmsg_lock, flags); kfree(info); + + /* Find gpadl buffer virtual address and size. */ + for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++) + if (channel->gpadl_array[i].gpadlhandle == gpadl_handle) + break; + + if (set_memory_encrypted((unsigned long)channel->gpadl_array[i].buffer, + HVPFN_UP(channel->gpadl_array[i].size))) + pr_warn("Fail to set mem host visibility.\n"); + + channel->gpadl_array[i].gpadlhandle = 0; + atomic_dec(>gpadl_index); + return ret; } EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl); diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index ddc8713ce57b..90b542597143 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -803,6 +803,14 @@ struct vmbus_device { #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096 +struct vmbus_gpadl { + u32 gpadlhandle; + u32 size; + void *buffer; +}; + +#define VMBUS_GPADL_RANGE_COUNT3 + struct vmbus_channel { struct list_head listentry; @@ -823,6 +831,9 @@ struct vmbus_channel { struct completion rescind_event; u32 ringbuffer_gpadlhandle; + /* GPADL_RING and Send/Receive GPADL_BUFFER. */ + struct vmbus_gpadl gpadl_array[VMBUS_GPADL_RANGE_COUNT]; + atomic_t gpadl_index; /* Allocated memory for ring buffer */ struct page *ringbuffer_page; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
From: Tianyu Lan Add new hvcall guest address host visibility support to mark memory visible to host. Call it inside set_memory_decrypted /encrypted(). Add HYPERVISOR feature check in the hv_is_isolation_supported() to optimize in non-virtualization environment. Signed-off-by: Tianyu Lan --- Change since v2: * Rework __set_memory_enc_dec() and call Hyper-V and AMD function according to platform check. Change since v1: * Use new staic call x86_set_memory_enc to avoid add Hyper-V specific check in the set_memory code. --- arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_init.c | 6 ++ arch/x86/hyperv/ivm.c | 114 + arch/x86/include/asm/hyperv-tlfs.h | 20 + arch/x86/include/asm/mshyperv.h| 4 +- arch/x86/mm/pat/set_memory.c | 19 +++-- include/asm-generic/hyperv-tlfs.h | 1 + include/asm-generic/mshyperv.h | 1 + 8 files changed, 160 insertions(+), 7 deletions(-) create mode 100644 arch/x86/hyperv/ivm.c diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile index 48e2c51464e8..5d2de10809ae 100644 --- a/arch/x86/hyperv/Makefile +++ b/arch/x86/hyperv/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-y := hv_init.o mmu.o nested.o irqdomain.o +obj-y := hv_init.o mmu.o nested.o irqdomain.o ivm.o obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o ifdef CONFIG_X86_64 diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 0bb4d9ca7a55..b3683083208a 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type); bool hv_is_isolation_supported(void) { + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) + return 0; + + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV)) + return 0; + return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE; } diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c new file mode 100644 index ..8c905ffdba7f --- /dev/null +++ b/arch/x86/hyperv/ivm.c @@ -0,0 +1,114 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Hyper-V Isolation VM interface with paravisor and hypervisor + * + * Author: + * Tianyu Lan + */ + +#include +#include +#include +#include +#include +#include + +/* + * hv_mark_gpa_visibility - Set pages visible to host via hvcall. + * + * In Isolation VM, all guest memory is encripted from host and guest + * needs to set memory visible to host via hvcall before sharing memory + * with host. + */ +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], + enum hv_mem_host_visibility visibility) +{ + struct hv_gpa_range_for_visibility **input_pcpu, *input; + u16 pages_processed; + u64 hv_status; + unsigned long flags; + + /* no-op if partition isolation is not enabled */ + if (!hv_is_isolation_supported()) + return 0; + + if (count > HV_MAX_MODIFY_GPA_REP_COUNT) { + pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count, + HV_MAX_MODIFY_GPA_REP_COUNT); + return -EINVAL; + } + + local_irq_save(flags); + input_pcpu = (struct hv_gpa_range_for_visibility **) + this_cpu_ptr(hyperv_pcpu_input_arg); + input = *input_pcpu; + if (unlikely(!input)) { + local_irq_restore(flags); + return -EINVAL; + } + + input->partition_id = HV_PARTITION_ID_SELF; + input->host_visibility = visibility; + input->reserved0 = 0; + input->reserved1 = 0; + memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn)); + hv_status = hv_do_rep_hypercall( + HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count, + 0, input, _processed); + local_irq_restore(flags); + + if (!(hv_status & HV_HYPERCALL_RESULT_MASK)) + return 0; + + return hv_status & HV_HYPERCALL_RESULT_MASK; +} +EXPORT_SYMBOL(hv_mark_gpa_visibility); + +static int __hv_set_mem_host_visibility(void *kbuffer, int pagecount, + enum hv_mem_host_visibility visibility) +{ + u64 *pfn_array; + int ret = 0; + int i, pfn; + + if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base) + return 0; + + pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL); + if (!pfn_array) + return -ENOMEM; + + for (i = 0, pfn = 0; i < pagecount; i++) { + pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE); + pfn++; + + if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) { + ret |= hv_mark_gpa_visibility(pfn, pfn_array, + visibility); + pfn = 0; + +
[PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.
From: Tianyu Lan Hyper-V exposes shared memory boundary via cpuid HYPERV_CPUID_ISOLATION_CONFIG and store it in the shared_gpa_boundary of ms_hyperv struct. This prepares to share memory with host for SNP guest. Signed-off-by: Tianyu Lan --- arch/x86/kernel/cpu/mshyperv.c | 2 ++ include/asm-generic/mshyperv.h | 12 +++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 6b5835a087a3..2b7f396ef1a5 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void) if (ms_hyperv.priv_high & HV_ISOLATION) { ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG); ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG); + ms_hyperv.shared_gpa_boundary = + (u64)1 << ms_hyperv.shared_gpa_boundary_bits; pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index 4269f3174e58..aa26d24a5ca9 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -35,8 +35,18 @@ struct ms_hyperv_info { u32 max_vp_index; u32 max_lp_index; u32 isolation_config_a; - u32 isolation_config_b; + union { + u32 isolation_config_b; + struct { + u32 cvm_type : 4; + u32 Reserved11 : 1; + u32 shared_gpa_boundary_active : 1; + u32 shared_gpa_boundary_bits : 6; + u32 Reserved12 : 20; + }; + }; void __percpu **ghcb_base; + u64 shared_gpa_boundary; }; extern struct ms_hyperv_info ms_hyperv; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM
From: Tianyu Lan Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest to communicate with hypervisor. Map GHCB page for all cpus to read/write MSR register and submit hvcall request via GHCB. Signed-off-by: Tianyu Lan --- arch/x86/hyperv/hv_init.c | 66 +++-- arch/x86/include/asm/mshyperv.h | 2 + include/asm-generic/mshyperv.h | 2 + 3 files changed, 66 insertions(+), 4 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 708a2712a516..0bb4d9ca7a55 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -42,6 +43,31 @@ static void *hv_hypercall_pg_saved; struct hv_vp_assist_page **hv_vp_assist_page; EXPORT_SYMBOL_GPL(hv_vp_assist_page); +static int hyperv_init_ghcb(void) +{ + u64 ghcb_gpa; + void *ghcb_va; + void **ghcb_base; + + if (!ms_hyperv.ghcb_base) + return -EINVAL; + + /* +* GHCB page is allocated by paravisor. The address +* returned by MSR_AMD64_SEV_ES_GHCB is above shared +* ghcb boundary and map it here. +*/ + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa); + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB); + if (!ghcb_va) + return -ENOMEM; + + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); + *ghcb_base = ghcb_va; + + return 0; +} + static int hv_cpu_init(unsigned int cpu) { union hv_vp_assist_msr_contents msr = { 0 }; @@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu) } } + hyperv_init_ghcb(); + return 0; } @@ -177,6 +205,14 @@ static int hv_cpu_die(unsigned int cpu) { struct hv_reenlightenment_control re_ctrl; unsigned int new_cpu; + void **ghcb_va = NULL; + + if (ms_hyperv.ghcb_base) { + ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); + if (*ghcb_va) + memunmap(*ghcb_va); + *ghcb_va = NULL; + } hv_common_cpu_die(cpu); @@ -383,9 +419,19 @@ void __init hyperv_init(void) VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, __builtin_return_address(0)); - if (hv_hypercall_pg == NULL) { - wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); - goto remove_cpuhp_state; + if (hv_hypercall_pg == NULL) + goto clean_guest_os_id; + + if (hv_isolation_type_snp()) { + ms_hyperv.ghcb_base = alloc_percpu(void *); + if (!ms_hyperv.ghcb_base) + goto clean_guest_os_id; + + if (hyperv_init_ghcb()) { + free_percpu(ms_hyperv.ghcb_base); + ms_hyperv.ghcb_base = NULL; + goto clean_guest_os_id; + } } rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); @@ -456,7 +502,8 @@ void __init hyperv_init(void) hv_query_ext_cap(0); return; -remove_cpuhp_state: +clean_guest_os_id: + wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); cpuhp_remove_state(cpuhp); free_vp_assist_page: kfree(hv_vp_assist_page); @@ -484,6 +531,9 @@ void hyperv_cleanup(void) */ hv_hypercall_pg = NULL; + if (ms_hyperv.ghcb_base) + free_percpu(ms_hyperv.ghcb_base); + /* Reset the hypercall page */ hypercall_msr.as_uint64 = 0; wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); @@ -559,3 +609,11 @@ bool hv_is_isolation_supported(void) { return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE; } + +DEFINE_STATIC_KEY_FALSE(isolation_type_snp); + +bool hv_isolation_type_snp(void) +{ + return static_branch_unlikely(_type_snp); +} +EXPORT_SYMBOL_GPL(hv_isolation_type_snp); diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index adccbc209169..6627cfd2bfba 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -11,6 +11,8 @@ #include #include +DECLARE_STATIC_KEY_FALSE(isolation_type_snp); + typedef int (*hyperv_fill_flush_list_func)( struct hv_guest_mapping_flush_list *flush, void *data); diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index c1ab6a6e72b5..4269f3174e58 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -36,6 +36,7 @@ struct ms_hyperv_info { u32 max_lp_index; u32 isolation_config_a; u32 isolation_config_b; + void __percpu **ghcb_base; }; extern struct ms_hyperv_info ms_hyperv; @@ -237,6 +238,7 @@ bool hv_is_hyperv_initialized(void); bool hv_is_hibernation_supported(void); enum hv_isolation_type hv_get_isolation_type(void); bool
[PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. There are two exceptions - packets sent by vmbus_sendpacket_ pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets contains IO stack memory address and host will access these memory. So add allocation bounce buffer support in vmbus for these packets. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. Change since V2: - Drop x86_set_memory_enc static call and use platform check in the __set_memory_enc_dec() to run platform callback of set memory encrypted or decrypted. Change since V1: - Introduce x86_set_memory_enc static call and so platforms can override __set_memory_enc_dec() with their implementation - Introduce sev_es_ghcb_hv_call_simple() and share code between SEV and Hyper-V code. - Not remap monitor pages in the non-SNP isolation VM - Make swiotlb_init_io_tlb_mem() return error code and return error when dma_map_decrypted() fails. Change since RFC V4: - Introduce dma map decrypted function to remap bounce buffer and provide dma map decrypted ops for platform to hook callback. - Split swiotlb and dma map decrypted change into two patches - Replace vstart with vaddr in swiotlb changes. Change since RFC v3: - Add interface set_memory_decrypted_map() to decrypt memory and map bounce buffer in extra address space - Remove swiotlb remap function and store the remap address returned by set_memory_decrypted_map() in swiotlb mem data structure. - Introduce hv_set_mem_enc() to make code more readable in the __set_memory_enc_dec(). Change since RFC v2: - Remove not UIO driver in Isolation VM patch - Use vmap_pfn() to replace ioremap_page_range function in order to avoid exposing symbol ioremap_page_range() and ioremap_page_range() - Call hv set mem host visibility hvcall in set_memory_encrypted/decrypted() - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook - Fix code style Tianyu Lan (13): x86/HV: Initialize GHCB page in Isolation VM x86/HV: Initialize shared memory boundary in the Isolation VM. x86/HV: Add new hvcall guest address host visibility support HV: Mark vmbus ring buffer visible to host in Isolation VM HV: Add Write/Read MSR registers via ghcb page HV: Add ghcb hvcall support for SNP VM HV/Vmbus: Add SNP support for VMbus channel initiate message HV/Vmbus: Initialize VMbus ring buffer for Isolation VM DMA: Add dma_map_decrypted/dma_unmap_encrypted() function x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM HV/Netvsc: Add Isolation VM support for netvsc driver HV/Storvsc: Add Isolation VM support for storvsc driver arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_init.c | 75 ++-- arch/x86/hyperv/ivm.c | 295 + arch/x86/include/asm/hyperv-tlfs.h | 20 ++ arch/x86/include/asm/mshyperv.h| 87 - arch/x86/include/asm/sev.h | 3 + arch/x86/kernel/cpu/mshyperv.c | 5 + arch/x86/kernel/sev-shared.c | 63 +++--- arch/x86/mm/pat/set_memory.c | 19 +- arch/x86/xen/pci-swiotlb-xen.c | 3 +- drivers/hv/Kconfig | 1 + drivers/hv/channel.c | 54 +- drivers/hv/connection.c| 71 ++- drivers/hv/hv.c| 129 + drivers/hv/hyperv_vmbus.h | 3 + drivers/hv/ring_buffer.c | 84 ++-- drivers/hv/vmbus_drv.c | 3 + drivers/iommu/hyperv-iommu.c | 65 +++ drivers/net/hyperv/hyperv_net.h| 6 + drivers/net/hyperv/netvsc.c| 144 +- drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c | 68 ++- include/asm-generic/hyperv-tlfs.h | 1 + include/asm-generic/mshyperv.h | 54 +-
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On 2021-08-09 23:10, Will Deacon wrote: On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote: On Mon, Aug 9, 2021 at 10:05 AM Will Deacon wrote: > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote: > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote: > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag") > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went > > > > > > > > > > the memory type setting required for the non-coherent masters to use > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached. > > > > > > > > > > Without this, the system cache lines are not allocated for GPU. > > > > > > > > > > > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC, > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC > > > > > > > > > > and makes GPU the user of this protection flag. > > > > > > > > > > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does > > > > > > > > > not apply anymore? > > > > > > > > > > > > > > > > > > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then > > > > > > > > I can repost the patch. > > > > > > > > > > > > > > I still think you need to handle the mismatched alias, no? You're adding > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That > > > > > > > can't be right. > > > > > > > > > > > > > > > > > > > Just curious, and maybe this is a dumb question, but what is your > > > > > > concern about mismatched aliases? I mean the cache hierarchy on the > > > > > > GPU device side (anything beyond the LLC) is pretty different and > > > > > > doesn't really care about the smmu pgtable attributes.. > > > > > > > > > > If the CPU accesses a shared buffer with different attributes to those which > > > > > the device is using then you fall into the "mismatched memory attributes" > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and > > > > > read it) and in some cases can apply to speculative accesses as well, but > > > > > the end result is typically loss of coherency. > > > > > > > > Ok, I might have a few other sections to read first to decipher the > > > > terminology.. > > > > > > > > But my understanding of LLC is that it looks just like system memory > > > > to the CPU and GPU (I think that would make it "the point of > > > > coherence" between the GPU and CPU?) If that is true, shouldn't it be > > > > invisible from the point of view of different CPU mapping options? > > > > > > You could certainly build a system where mismatched attributes don't cause > > > loss of coherence, but as it's not guaranteed by the architecture and the > > > changes proposed here affect APIs which are exposed across SoCs, then I > > > don't think it helps much. > > > > > > > Hmm, the description of the new mapping flag is that it applies only > > to transparent outer level cache: > > > > +/* > > + * Non-coherent masters can use this page protection flag to set cacheable > > + * memory attributes for only a transparent outer level of cache, also known as > > + * the last-level or system cache. > > + */ > > +#define IOMMU_LLC (1 << 6) > > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something > > like that to make it more clear that it is not necessarily something > > that would work with a different outer level cache implementation? > > ... or we could just deal with the problem so that other people can reuse > the code. I haven't really understood the reluctance to solve this properly. > > Am I missing some reason this isn't solvable? Oh, was there another way to solve it (other than foregoing setting INC_OCACHE in the pgtables)? Maybe I misunderstood, is there a corresponding setting on the MMU pgtables side of things? Right -- we just need to program the CPU's MMU with the matching memory attributes! It's a bit more fiddly if you're just using ioremap_wc() though, as it's usually the DMA API which handles the attributes under the hood. Anyway, sorry, I should've said that explicitly earlier on. We've done this sort of thing in the Android tree so I assumed Sai knew what needed to be done and then I didn't think to explain to you :( Right I was aware of that but even in
Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE
On 2021-08-07 12:47, Sven Peter via iommu wrote: On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote: On 2021-08-06 16:55, Sven Peter via iommu wrote: @@ -1006,6 +1019,31 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, if (dev_is_untrusted(dev)) return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs); + /* +* If the IOMMU pagesize is larger than the CPU pagesize we will +* very likely run into sgs with a physical address that is not aligned +* to an IOMMU page boundary. Fall back to just mapping every entry +* independently with __iommu_dma_map then. Scatterlist segments often don't have nicely aligned ends, which is why we already align things to IOVA granules in main loop here. I think in principle we'd just need to move the non-IOVA-aligned part of the address from sg->page to sg->offset in the temporary transformation for the rest of the assumptions to hold. I don't blame you for being timid about touching that, though - it took me 3 tries to get right when I first wrote it... I've spent some time with that code now and I think we cannot use it but have to fall back to iommu_dma_map_sg_swiotlb (even though that swiotlb part is a lie then): When we have sg_phys(s) = 0x802e65000 with s->offset = 0 the paddr is aligned to PAGE_SIZE but has an offset of 0x1000 from something the IOMMU can map. Now this would result in s->offset = -0x1000 which is already weird enough. Offset is unsigned (and 32bit) so this will actually look like s->offset = 0xf000 then, which isn't much better. And then sg_phys(s) = 0x902e64000 (instead of 0x802e64000) and we'll map some random memory in iommu_map_sg_atomic and a little bit later everything explodes. Now I could probably adjust the phys addr backwards and make sure offset is always positive (and possibly larger than PAGE_SIZE) and later restore it in __finalise_sg then but I feel like that's pushing this a little bit too far. Yes, that's what I meant. At a quick guess, something like the completely untested diff below. It really comes down to what we want to achieve here - if it's just to make this thing work at all, then I'd favour bolting on the absolute minimum changes, possibly even cheating by tainting the kernel and saying all bets are off instead of trying to handle the more involved corners really properly. However if you want to work towards this being a properly-supported thing, then I think it's worth generalising the existing assumptions of page alignment from the beginning. Robin. ->8- diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 58cc7297aab5..73aeaa8bfc73 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -895,8 +895,8 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, unsigned int s_length = sg_dma_len(s); unsigned int s_iova_len = s->length; - s->offset += s_iova_off; - s->length = s_length; + sg_set_page(s, phys_to_page(sg_phys(s)), s_length, + s_iova_off & ~PAGE_MASK); sg_dma_address(s) = DMA_MAPPING_ERROR; sg_dma_len(s) = 0; @@ -940,10 +940,9 @@ static void __invalidate_sg(struct scatterlist *sg, int nents) int i; for_each_sg(sg, s, nents, i) { - if (sg_dma_address(s) != DMA_MAPPING_ERROR) - s->offset += sg_dma_address(s); if (sg_dma_len(s)) - s->length = sg_dma_len(s); + sg_set_page(s, phys_to_page(sg_phys(s)), sg_dma_len(s), + sg_dma_address(s) & ~PAGE_MASK); sg_dma_address(s) = DMA_MAPPING_ERROR; sg_dma_len(s) = 0; } @@ -1019,15 +1018,16 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, * stashing the unaligned parts in the as-yet-unused DMA fields. */ for_each_sg(sg, s, nents, i) { - size_t s_iova_off = iova_offset(iovad, s->offset); + phys_addr_t s_phys = sg_phys(s); + size_t s_iova_off = iova_offset(iovad, s_phys); size_t s_length = s->length; size_t pad_len = (mask - iova_len + 1) & mask; sg_dma_address(s) = s_iova_off; sg_dma_len(s) = s_length; - s->offset -= s_iova_off; s_length = iova_align(iovad, s_length + s_iova_off); - s->length = s_length; + sg_set_page(s, phys_to_page(s_phys - s_iova_off), + s_length, s->offset ^ s_iova_off); /* * Due to the alignment of our single IOVA allocation, we can ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote: > On Mon, Aug 9, 2021 at 10:05 AM Will Deacon wrote: > > > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote: > > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: > > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: > > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: > > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon > > > > > > > wrote: > > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan > > > > > > > > wrote: > > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash > > > > > > > > > > Ranjan wrote: > > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused > > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag") > > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along > > > > > > > > > > > with it went > > > > > > > > > > > the memory type setting required for the non-coherent > > > > > > > > > > > masters to use > > > > > > > > > > > system cache. Now that system cache support for GPU is > > > > > > > > > > > added, we will > > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be > > > > > > > > > > > sys cached. > > > > > > > > > > > Without this, the system cache lines are not allocated > > > > > > > > > > > for GPU. > > > > > > > > > > > > > > > > > > > > > > So the patches in this series introduces a new prot flag > > > > > > > > > > > IOMMU_LLC, > > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to > > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC > > > > > > > > > > > and makes GPU the user of this protection flag. > > > > > > > > > > > > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, > > > > > > > > > > as it does > > > > > > > > > > not apply anymore? > > > > > > > > > > > > > > > > > > > > > > > > > > > > I was waiting on Will's reply [1]. If there are no changes > > > > > > > > > needed, then > > > > > > > > > I can repost the patch. > > > > > > > > > > > > > > > > I still think you need to handle the mismatched alias, no? > > > > > > > > You're adding > > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU > > > > > > > > side. That > > > > > > > > can't be right. > > > > > > > > > > > > > > > > > > > > > > Just curious, and maybe this is a dumb question, but what is your > > > > > > > concern about mismatched aliases? I mean the cache hierarchy on > > > > > > > the > > > > > > > GPU device side (anything beyond the LLC) is pretty different and > > > > > > > doesn't really care about the smmu pgtable attributes.. > > > > > > > > > > > > If the CPU accesses a shared buffer with different attributes to > > > > > > those which > > > > > > the device is using then you fall into the "mismatched memory > > > > > > attributes" > > > > > > part of the Arm architecture. It's reasonably unforgiving (you > > > > > > should go and > > > > > > read it) and in some cases can apply to speculative accesses as > > > > > > well, but > > > > > > the end result is typically loss of coherency. > > > > > > > > > > Ok, I might have a few other sections to read first to decipher the > > > > > terminology.. > > > > > > > > > > But my understanding of LLC is that it looks just like system memory > > > > > to the CPU and GPU (I think that would make it "the point of > > > > > coherence" between the GPU and CPU?) If that is true, shouldn't it be > > > > > invisible from the point of view of different CPU mapping options? > > > > > > > > You could certainly build a system where mismatched attributes don't > > > > cause > > > > loss of coherence, but as it's not guaranteed by the architecture and > > > > the > > > > changes proposed here affect APIs which are exposed across SoCs, then I > > > > don't think it helps much. > > > > > > > > > > Hmm, the description of the new mapping flag is that it applies only > > > to transparent outer level cache: > > > > > > +/* > > > + * Non-coherent masters can use this page protection flag to set > > > cacheable > > > + * memory attributes for only a transparent outer level of cache, also > > > known as > > > + * the last-level or system cache. > > > + */ > > > +#define IOMMU_LLC (1 << 6) > > > > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something > > > like that to make it more clear that it is not necessarily something > > > that would work with a different outer level cache implementation? > > > > ... or we could just deal with the problem so that other people can reuse > > the code. I haven't really understood the reluctance to solve this properly. > > > > Am I missing some reason this isn't solvable? > > Oh, was there another way to solve it (other than foregoing setting > INC_OCACHE in the pgtables)? Maybe I misunderstood, is there a > corresponding
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 9, 2021 at 10:05 AM Will Deacon wrote: > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote: > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan > > > > > > > wrote: > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan > > > > > > > > > wrote: > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag") > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along > > > > > > > > > > with it went > > > > > > > > > > the memory type setting required for the non-coherent > > > > > > > > > > masters to use > > > > > > > > > > system cache. Now that system cache support for GPU is > > > > > > > > > > added, we will > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be > > > > > > > > > > sys cached. > > > > > > > > > > Without this, the system cache lines are not allocated for > > > > > > > > > > GPU. > > > > > > > > > > > > > > > > > > > > So the patches in this series introduces a new prot flag > > > > > > > > > > IOMMU_LLC, > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC > > > > > > > > > > and makes GPU the user of this protection flag. > > > > > > > > > > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, > > > > > > > > > as it does > > > > > > > > > not apply anymore? > > > > > > > > > > > > > > > > > > > > > > > > > I was waiting on Will's reply [1]. If there are no changes > > > > > > > > needed, then > > > > > > > > I can repost the patch. > > > > > > > > > > > > > > I still think you need to handle the mismatched alias, no? You're > > > > > > > adding > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU > > > > > > > side. That > > > > > > > can't be right. > > > > > > > > > > > > > > > > > > > Just curious, and maybe this is a dumb question, but what is your > > > > > > concern about mismatched aliases? I mean the cache hierarchy on the > > > > > > GPU device side (anything beyond the LLC) is pretty different and > > > > > > doesn't really care about the smmu pgtable attributes.. > > > > > > > > > > If the CPU accesses a shared buffer with different attributes to > > > > > those which > > > > > the device is using then you fall into the "mismatched memory > > > > > attributes" > > > > > part of the Arm architecture. It's reasonably unforgiving (you should > > > > > go and > > > > > read it) and in some cases can apply to speculative accesses as well, > > > > > but > > > > > the end result is typically loss of coherency. > > > > > > > > Ok, I might have a few other sections to read first to decipher the > > > > terminology.. > > > > > > > > But my understanding of LLC is that it looks just like system memory > > > > to the CPU and GPU (I think that would make it "the point of > > > > coherence" between the GPU and CPU?) If that is true, shouldn't it be > > > > invisible from the point of view of different CPU mapping options? > > > > > > You could certainly build a system where mismatched attributes don't cause > > > loss of coherence, but as it's not guaranteed by the architecture and the > > > changes proposed here affect APIs which are exposed across SoCs, then I > > > don't think it helps much. > > > > > > > Hmm, the description of the new mapping flag is that it applies only > > to transparent outer level cache: > > > > +/* > > + * Non-coherent masters can use this page protection flag to set cacheable > > + * memory attributes for only a transparent outer level of cache, also > > known as > > + * the last-level or system cache. > > + */ > > +#define IOMMU_LLC (1 << 6) > > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something > > like that to make it more clear that it is not necessarily something > > that would work with a different outer level cache implementation? > > ... or we could just deal with the problem so that other people can reuse > the code. I haven't really understood the reluctance to solve this properly. > > Am I missing some reason this isn't solvable? > Oh, was there another way to solve it (other than foregoing setting INC_OCACHE in the pgtables)? Maybe I misunderstood, is there a corresponding setting on the MMU pgtables side of things? BR, -R ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote: > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan > > > > > > > > wrote: > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag") > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with > > > > > > > > > it went > > > > > > > > > the memory type setting required for the non-coherent masters > > > > > > > > > to use > > > > > > > > > system cache. Now that system cache support for GPU is added, > > > > > > > > > we will > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys > > > > > > > > > cached. > > > > > > > > > Without this, the system cache lines are not allocated for > > > > > > > > > GPU. > > > > > > > > > > > > > > > > > > So the patches in this series introduces a new prot flag > > > > > > > > > IOMMU_LLC, > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC > > > > > > > > > and makes GPU the user of this protection flag. > > > > > > > > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as > > > > > > > > it does > > > > > > > > not apply anymore? > > > > > > > > > > > > > > > > > > > > > > I was waiting on Will's reply [1]. If there are no changes > > > > > > > needed, then > > > > > > > I can repost the patch. > > > > > > > > > > > > I still think you need to handle the mismatched alias, no? You're > > > > > > adding > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. > > > > > > That > > > > > > can't be right. > > > > > > > > > > > > > > > > Just curious, and maybe this is a dumb question, but what is your > > > > > concern about mismatched aliases? I mean the cache hierarchy on the > > > > > GPU device side (anything beyond the LLC) is pretty different and > > > > > doesn't really care about the smmu pgtable attributes.. > > > > > > > > If the CPU accesses a shared buffer with different attributes to those > > > > which > > > > the device is using then you fall into the "mismatched memory > > > > attributes" > > > > part of the Arm architecture. It's reasonably unforgiving (you should > > > > go and > > > > read it) and in some cases can apply to speculative accesses as well, > > > > but > > > > the end result is typically loss of coherency. > > > > > > Ok, I might have a few other sections to read first to decipher the > > > terminology.. > > > > > > But my understanding of LLC is that it looks just like system memory > > > to the CPU and GPU (I think that would make it "the point of > > > coherence" between the GPU and CPU?) If that is true, shouldn't it be > > > invisible from the point of view of different CPU mapping options? > > > > You could certainly build a system where mismatched attributes don't cause > > loss of coherence, but as it's not guaranteed by the architecture and the > > changes proposed here affect APIs which are exposed across SoCs, then I > > don't think it helps much. > > > > Hmm, the description of the new mapping flag is that it applies only > to transparent outer level cache: > > +/* > + * Non-coherent masters can use this page protection flag to set cacheable > + * memory attributes for only a transparent outer level of cache, also known > as > + * the last-level or system cache. > + */ > +#define IOMMU_LLC (1 << 6) > > But I suppose we could call it instead IOMMU_QCOM_LLC or something > like that to make it more clear that it is not necessarily something > that would work with a different outer level cache implementation? ... or we could just deal with the problem so that other people can reuse the code. I haven't really understood the reluctance to solve this properly. Am I missing some reason this isn't solvable? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 9, 2021 at 7:56 AM Will Deacon wrote: > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: > > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: > > > > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan > > > > > > > wrote: > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY > > > > > > > > flag") > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it > > > > > > > > went > > > > > > > > the memory type setting required for the non-coherent masters > > > > > > > > to use > > > > > > > > system cache. Now that system cache support for GPU is added, > > > > > > > > we will > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys > > > > > > > > cached. > > > > > > > > Without this, the system cache lines are not allocated for GPU. > > > > > > > > > > > > > > > > So the patches in this series introduces a new prot flag > > > > > > > > IOMMU_LLC, > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC > > > > > > > > and makes GPU the user of this protection flag. > > > > > > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it > > > > > > > does > > > > > > > not apply anymore? > > > > > > > > > > > > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, > > > > > > then > > > > > > I can repost the patch. > > > > > > > > > > I still think you need to handle the mismatched alias, no? You're > > > > > adding > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. > > > > > That > > > > > can't be right. > > > > > > > > > > > > > Just curious, and maybe this is a dumb question, but what is your > > > > concern about mismatched aliases? I mean the cache hierarchy on the > > > > GPU device side (anything beyond the LLC) is pretty different and > > > > doesn't really care about the smmu pgtable attributes.. > > > > > > If the CPU accesses a shared buffer with different attributes to those > > > which > > > the device is using then you fall into the "mismatched memory attributes" > > > part of the Arm architecture. It's reasonably unforgiving (you should go > > > and > > > read it) and in some cases can apply to speculative accesses as well, but > > > the end result is typically loss of coherency. > > > > Ok, I might have a few other sections to read first to decipher the > > terminology.. > > > > But my understanding of LLC is that it looks just like system memory > > to the CPU and GPU (I think that would make it "the point of > > coherence" between the GPU and CPU?) If that is true, shouldn't it be > > invisible from the point of view of different CPU mapping options? > > You could certainly build a system where mismatched attributes don't cause > loss of coherence, but as it's not guaranteed by the architecture and the > changes proposed here affect APIs which are exposed across SoCs, then I > don't think it helps much. > Hmm, the description of the new mapping flag is that it applies only to transparent outer level cache: +/* + * Non-coherent masters can use this page protection flag to set cacheable + * memory attributes for only a transparent outer level of cache, also known as + * the last-level or system cache. + */ +#define IOMMU_LLC (1 << 6) But I suppose we could call it instead IOMMU_QCOM_LLC or something like that to make it more clear that it is not necessarily something that would work with a different outer level cache implementation? BR, -R ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 06/14] x86: Secure Launch main header file
Introduce the main Secure Launch header file used in the early SL stub and the early setup code. Signed-off-by: Ross Philipson --- include/linux/slaunch.h | 532 1 file changed, 532 insertions(+) create mode 100644 include/linux/slaunch.h diff --git a/include/linux/slaunch.h b/include/linux/slaunch.h new file mode 100644 index 000..c125b67 --- /dev/null +++ b/include/linux/slaunch.h @@ -0,0 +1,532 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Main Secure Launch header file. + * + * Copyright (c) 2021, Oracle and/or its affiliates. + */ + +#ifndef _LINUX_SLAUNCH_H +#define _LINUX_SLAUNCH_H + +/* + * Secure Launch Defined State Flags + */ +#define SL_FLAG_ACTIVE 0x0001 +#define SL_FLAG_ARCH_SKINIT0x0002 +#define SL_FLAG_ARCH_TXT 0x0004 + +/* + * Secure Launch CPU Type + */ +#define SL_CPU_AMD 1 +#define SL_CPU_INTEL 2 + +#if IS_ENABLED(CONFIG_SECURE_LAUNCH) + +#define __SL32_CS 0x0008 +#define __SL32_DS 0x0010 + +/* + * Intel Safer Mode Extensions (SMX) + * + * Intel SMX provides a programming interface to establish a Measured Launched + * Environment (MLE). The measurement and protection mechanisms supported by the + * capabilities of an Intel Trusted Execution Technology (TXT) platform. SMX is + * the processor’s programming interface in an Intel TXT platform. + * + * See Intel SDM Volume 2 - 6.1 "Safer Mode Extensions Reference" + */ + +/* + * SMX GETSEC Leaf Functions + */ +#define SMX_X86_GETSEC_SEXIT 5 +#define SMX_X86_GETSEC_SMCTRL 7 +#define SMX_X86_GETSEC_WAKEUP 8 + +/* + * Intel Trusted Execution Technology MMIO Registers Banks + */ +#define TXT_PUB_CONFIG_REGS_BASE 0xfed3 +#define TXT_PRIV_CONFIG_REGS_BASE 0xfed2 +#define TXT_NR_CONFIG_PAGES ((TXT_PUB_CONFIG_REGS_BASE - \ + TXT_PRIV_CONFIG_REGS_BASE) >> PAGE_SHIFT) + +/* + * Intel Trusted Execution Technology (TXT) Registers + */ +#define TXT_CR_STS 0x +#define TXT_CR_ESTS0x0008 +#define TXT_CR_ERRORCODE 0x0030 +#define TXT_CR_CMD_RESET 0x0038 +#define TXT_CR_CMD_CLOSE_PRIVATE 0x0048 +#define TXT_CR_DIDVID 0x0110 +#define TXT_CR_VER_EMIF0x0200 +#define TXT_CR_CMD_UNLOCK_MEM_CONFIG 0x0218 +#define TXT_CR_SINIT_BASE 0x0270 +#define TXT_CR_SINIT_SIZE 0x0278 +#define TXT_CR_MLE_JOIN0x0290 +#define TXT_CR_HEAP_BASE 0x0300 +#define TXT_CR_HEAP_SIZE 0x0308 +#define TXT_CR_SCRATCHPAD 0x0378 +#define TXT_CR_CMD_OPEN_LOCALITY1 0x0380 +#define TXT_CR_CMD_CLOSE_LOCALITY1 0x0388 +#define TXT_CR_CMD_OPEN_LOCALITY2 0x0390 +#define TXT_CR_CMD_CLOSE_LOCALITY2 0x0398 +#define TXT_CR_CMD_SECRETS 0x08e0 +#define TXT_CR_CMD_NO_SECRETS 0x08e8 +#define TXT_CR_E2STS 0x08f0 + +/* TXT default register value */ +#define TXT_REGVALUE_ONE 0x1ULL + +/* TXTCR_STS status bits */ +#define TXT_SENTER_DONE_STS(1<<0) +#define TXT_SEXIT_DONE_STS (1<<1) + +/* + * SINIT/MLE Capabilities Field Bit Definitions + */ +#define TXT_SINIT_MLE_CAP_WAKE_GETSEC 0 +#define TXT_SINIT_MLE_CAP_WAKE_MONITOR 1 + +/* + * OS/MLE Secure Launch Specific Definitions + */ +#define TXT_OS_MLE_STRUCT_VERSION 1 +#define TXT_OS_MLE_MAX_VARIABLE_MTRRS 32 + +/* + * TXT Heap Table Enumeration + */ +#define TXT_BIOS_DATA_TABLE1 +#define TXT_OS_MLE_DATA_TABLE 2 +#define TXT_OS_SINIT_DATA_TABLE3 +#define TXT_SINIT_MLE_DATA_TABLE 4 +#define TXT_SINIT_TABLE_MAXTXT_SINIT_MLE_DATA_TABLE + +/* + * Secure Launch Defined Error Codes used in MLE-initiated TXT resets. + * + * TXT Specification + * Appendix I ACM Error Codes + */ +#define SL_ERROR_GENERIC 0xc0008001 +#define SL_ERROR_TPM_INIT 0xc0008002 +#define SL_ERROR_TPM_INVALID_LOG20 0xc0008003 +#define SL_ERROR_TPM_LOGGING_FAILED0xc0008004 +#define SL_ERROR_REGION_STRADDLE_4GB 0xc0008005 +#define SL_ERROR_TPM_EXTEND0xc0008006 +#define SL_ERROR_MTRR_INV_VCNT 0xc0008007 +#define SL_ERROR_MTRR_INV_DEF_TYPE 0xc0008008 +#define SL_ERROR_MTRR_INV_BASE 0xc0008009 +#define SL_ERROR_MTRR_INV_MASK 0xc000800a +#define SL_ERROR_MSR_INV_MISC_EN 0xc000800b +#define SL_ERROR_INV_AP_INTERRUPT 0xc000800c +#define SL_ERROR_INTEGER_OVERFLOW 0xc000800d +#define SL_ERROR_HEAP_WALK 0xc000800e +#define SL_ERROR_HEAP_MAP 0xc000800f +#define SL_ERROR_REGION_ABOVE_4GB 0xc0008010 +#define SL_ERROR_HEAP_INVALID_DMAR 0xc0008011 +#define SL_ERROR_HEAP_DMAR_SIZE0xc0008012 +#define SL_ERROR_HEAP_DMAR_MAP 0xc0008013 +#define SL_ERROR_HI_PMR_BASE 0xc0008014 +#define SL_ERROR_HI_PMR_SIZE 0xc0008015 +#define
[PATCH v3 12/14] reboot: Secure Launch SEXIT support on reboot paths
If the MLE kernel is being powered off, rebooted or halted, then SEXIT must be called. Note that the SEXIT GETSEC leaf can only be called after a machine_shutdown() has been done on these paths. The machine_shutdown() is not called on a few paths like when poweroff action does not have a poweroff callback (into ACPI code) or when an emergency reset is done. In these cases, just the TXT registers are finalized but SEXIT is skipped. Signed-off-by: Ross Philipson --- arch/x86/kernel/reboot.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index ebfb911..fe9d8cc 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -731,6 +732,7 @@ static void native_machine_restart(char *__unused) if (!reboot_force) machine_shutdown(); + slaunch_finalize(!reboot_force); __machine_emergency_restart(0); } @@ -741,6 +743,9 @@ static void native_machine_halt(void) tboot_shutdown(TB_SHUTDOWN_HALT); + /* SEXIT done after machine_shutdown() to meet TXT requirements */ + slaunch_finalize(1); + stop_this_cpu(NULL); } @@ -749,8 +754,12 @@ static void native_machine_power_off(void) if (pm_power_off) { if (!reboot_force) machine_shutdown(); + slaunch_finalize(!reboot_force); pm_power_off(); + } else { + slaunch_finalize(0); } + /* A fallback in case there is no PM info available */ tboot_shutdown(TB_SHUTDOWN_HALT); } @@ -778,6 +787,7 @@ void machine_shutdown(void) void machine_emergency_restart(void) { + slaunch_finalize(0); __machine_emergency_restart(1); } -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 08/14] x86: Secure Launch kernel early boot stub
The Secure Launch (SL) stub provides the entry point for Intel TXT (and later AMD SKINIT) to vector to during the late launch. The symbol sl_stub_entry is that entry point and its offset into the kernel is conveyed to the launching code using the MLE (Measured Launch Environment) header in the structure named mle_header. The offset of the MLE header is set in the kernel_info. The routine sl_stub contains the very early late launch setup code responsible for setting up the basic environment to allow the normal kernel startup_32 code to proceed. It is also responsible for properly waking and handling the APs on Intel platforms. The routine sl_main which runs after entering 64b mode is responsible for measuring configuration and module information before it is used like the boot params, the kernel command line, the TXT heap, an external initramfs, etc. Signed-off-by: Ross Philipson --- Documentation/x86/boot.rst | 13 + arch/x86/boot/compressed/Makefile | 3 +- arch/x86/boot/compressed/head_64.S | 37 ++ arch/x86/boot/compressed/kernel_info.S | 34 ++ arch/x86/boot/compressed/sl_main.c | 549 ++ arch/x86/boot/compressed/sl_stub.S | 685 + arch/x86/kernel/asm-offsets.c | 19 + 7 files changed, 1339 insertions(+), 1 deletion(-) create mode 100644 arch/x86/boot/compressed/sl_main.c create mode 100644 arch/x86/boot/compressed/sl_stub.S diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst index 894a198..2fbcb77 100644 --- a/Documentation/x86/boot.rst +++ b/Documentation/x86/boot.rst @@ -1026,6 +1026,19 @@ Offset/size: 0x000c/4 This field contains maximal allowed type for setup_data and setup_indirect structs. + = +Field name:mle_header_offset +Offset/size: 0x0010/4 + = + + This field contains the offset to the Secure Launch Measured Launch Environment + (MLE) header. This offset is used to locate information needed during a secure + late launch using Intel TXT. If the offset is zero, the kernel does not have + Secure Launch capabilities. The MLE entry point is called from TXT on the BSP + following a success measured launch. The specific state of the processors is + outlined in the TXT Software Development Guide, the latest can be found here: + https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf + The Image Checksum == diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 059d49a..1fe55a5 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -102,7 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \ + $(obj)/sl_main.o $(obj)/sl_stub.o $(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE $(call if_changed,ld) diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index a2347de..b35e072 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -498,6 +498,17 @@ trampoline_return: pushq $0 popfq +#ifdef CONFIG_SECURE_LAUNCH + pushq %rsi + + /* Ensure the relocation region coverd by a PMR */ + movq%rbx, %rdi + movl$(_bss - startup_32), %esi + callq sl_check_region + + popq%rsi +#endif + /* * Copy the compressed kernel to the end of our buffer * where decompression in place becomes safe. @@ -556,6 +567,32 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated) shrq$3, %rcx rep stosq +#ifdef CONFIG_SECURE_LAUNCH + /* +* Have to do the final early sl stub work in 64b area. +* +* *** NOTE *** +* +* Several boot params get used before we get a chance to measure +* them in this call. This is a known issue and we currently don't +* have a solution. The scratch field doesn't matter. There is no +* obvious way to do anything about the use of kernel_alignment or +* init_size though these seem low risk with all the PMR and overlap +* checks in place. +*/ + pushq %rsi + + movq%rsi, %rdi + callq sl_main + + /* Ensure the decompression location is coverd by a PMR */ + movq%rbp, %rdi + movqoutput_len(%rip), %rsi + callq sl_check_region + + popq%rsi +#endif + /* * If running as an SEV guest, the encryption mask is required in the * page-table setup code below. When the guest also has SEV-ES enabled diff --git
[PATCH v3 07/14] x86: Add early SHA support for Secure Launch early measurements
From: "Daniel P. Smith" The SHA algorithms are necessary to measure configuration information into the TPM as early as possible before using the values. This implementation uses the established approach of #including the SHA libraries directly in the code since the compressed kernel is not uncompressed at this point. The SHA code here has its origins in the code from the main kernel, commit c4d5b9ffa31f (crypto: sha1 - implement base layer for SHA-1). That code could not be pulled directly into the setup portion of the compressed kernel because of other dependencies it pulls in. The result is this is a modified copy of that code that still leverages the core SHA algorithms. Signed-off-by: Daniel P. Smith Signed-off-by: Ross Philipson --- arch/x86/boot/compressed/Makefile | 2 + arch/x86/boot/compressed/early_sha1.c | 103 arch/x86/boot/compressed/early_sha1.h | 17 ++ arch/x86/boot/compressed/early_sha256.c | 7 +++ lib/crypto/sha256.c | 8 +++ lib/sha1.c | 4 ++ 6 files changed, 141 insertions(+) create mode 100644 arch/x86/boot/compressed/early_sha1.c create mode 100644 arch/x86/boot/compressed/early_sha1.h create mode 100644 arch/x86/boot/compressed/early_sha256.c diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 431bf7f..059d49a 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -102,6 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o + $(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE $(call if_changed,ld) diff --git a/arch/x86/boot/compressed/early_sha1.c b/arch/x86/boot/compressed/early_sha1.c new file mode 100644 index 000..74f4654 --- /dev/null +++ b/arch/x86/boot/compressed/early_sha1.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2021 Apertus Solutions, LLC. + */ + +#include +#include +#include +#include +#include + +#include "early_sha1.h" + +#define SHA1_DISABLE_EXPORT +#include "../../../../lib/sha1.c" + +/* The SHA1 implementation in lib/sha1.c was written to get the workspace + * buffer as a parameter. This wrapper function provides a container + * around a temporary workspace that is cleared after the transform completes. + */ +static void __sha_transform(u32 *digest, const char *data) +{ + u32 ws[SHA1_WORKSPACE_WORDS]; + + sha1_transform(digest, data, ws); + + memset(ws, 0, sizeof(ws)); + /* +* As this is cryptographic code, prevent the memset 0 from being +* optimized out potentially leaving secrets in memory. +*/ + wmb(); + +} + +void early_sha1_init(struct sha1_state *sctx) +{ + sha1_init(sctx->state); + sctx->count = 0; +} + +void early_sha1_update(struct sha1_state *sctx, + const u8 *data, + unsigned int len) +{ + unsigned int partial = sctx->count % SHA1_BLOCK_SIZE; + + sctx->count += len; + + if (likely((partial + len) >= SHA1_BLOCK_SIZE)) { + int blocks; + + if (partial) { + int p = SHA1_BLOCK_SIZE - partial; + + memcpy(sctx->buffer + partial, data, p); + data += p; + len -= p; + + __sha_transform(sctx->state, sctx->buffer); + } + + blocks = len / SHA1_BLOCK_SIZE; + len %= SHA1_BLOCK_SIZE; + + if (blocks) { + while (blocks--) { + __sha_transform(sctx->state, data); + data += SHA1_BLOCK_SIZE; + } + } + partial = 0; + } + + if (len) + memcpy(sctx->buffer + partial, data, len); +} + +void early_sha1_final(struct sha1_state *sctx, u8 *out) +{ + const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64); + __be64 *bits = (__be64 *)(sctx->buffer + bit_offset); + __be32 *digest = (__be32 *)out; + unsigned int partial = sctx->count % SHA1_BLOCK_SIZE; + int i; + + sctx->buffer[partial++] = 0x80; + if (partial > bit_offset) { + memset(sctx->buffer + partial, 0x0, SHA1_BLOCK_SIZE - partial); + partial = 0; + + __sha_transform(sctx->state, sctx->buffer); + } + + memset(sctx->buffer + partial, 0x0, bit_offset - partial); + *bits = cpu_to_be64(sctx->count << 3); + __sha_transform(sctx->state, sctx->buffer); + + for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++) + put_unaligned_be32(sctx->state[i], digest++); + + *sctx
[PATCH v3 09/14] x86: Secure Launch kernel late boot stub
The routine slaunch_setup is called out of the x86 specific setup_arch routine during early kernel boot. After determining what platform is present, various operations specific to that platform occur. This includes finalizing setting for the platform late launch and verifying that memory protections are in place. For TXT, this code also reserves the original compressed kernel setup area where the APs were left looping so that this memory cannot be used. Signed-off-by: Ross Philipson --- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/setup.c| 3 + arch/x86/kernel/slaunch.c | 460 + drivers/iommu/intel/dmar.c | 4 + 4 files changed, 468 insertions(+) create mode 100644 arch/x86/kernel/slaunch.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 3e625c6..d6ee904 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -80,6 +80,7 @@ obj-$(CONFIG_X86_32) += tls.o obj-$(CONFIG_IA32_EMULATION) += tls.o obj-y += step.o obj-$(CONFIG_INTEL_TXT)+= tboot.o +obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o obj-$(CONFIG_ISA_DMA_API) += i8237.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += cpu/ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 055a834..482bd76 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -976,6 +977,8 @@ void __init setup_arch(char **cmdline_p) early_gart_iommu_check(); #endif + slaunch_setup_txt(); + /* * partially used pages are not usable - thus * we are rounding upwards: diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c new file mode 100644 index 000..f91f0b5 --- /dev/null +++ b/arch/x86/kernel/slaunch.c @@ -0,0 +1,460 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Secure Launch late validation/setup and finalization support. + * + * Copyright (c) 2021, Oracle and/or its affiliates. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static u32 sl_flags; +static struct sl_ap_wake_info ap_wake_info; +static u64 evtlog_addr; +static u32 evtlog_size; +static u64 vtd_pmr_lo_size; + +/* This should be plenty of room */ +static u8 txt_dmar[PAGE_SIZE] __aligned(16); + +u32 slaunch_get_flags(void) +{ + return sl_flags; +} +EXPORT_SYMBOL(slaunch_get_flags); + +struct sl_ap_wake_info *slaunch_get_ap_wake_info(void) +{ + return _wake_info; +} + +struct acpi_table_header *slaunch_get_dmar_table(struct acpi_table_header *dmar) +{ + /* The DMAR is only stashed and provided via TXT on Intel systems */ + if (memcmp(txt_dmar, "DMAR", 4)) + return dmar; + + return (struct acpi_table_header *)(_dmar[0]); +} + +void __noreturn slaunch_txt_reset(void __iomem *txt, + const char *msg, u64 error) +{ + u64 one = 1, val; + + pr_err("%s", msg); + + /* +* This performs a TXT reset with a sticky error code. The reads of +* TXT_CR_E2STS act as barriers. +*/ + memcpy_toio(txt + TXT_CR_ERRORCODE, , sizeof(error)); + memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val)); + memcpy_toio(txt + TXT_CR_CMD_NO_SECRETS, , sizeof(one)); + memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val)); + memcpy_toio(txt + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(one)); + memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val)); + memcpy_toio(txt + TXT_CR_CMD_RESET, , sizeof(one)); + + for ( ; ; ) + asm volatile ("hlt"); + + unreachable(); +} + +/* + * The TXT heap is too big to map all at once with early_ioremap + * so it is done a table at a time. + */ +static void __init *txt_early_get_heap_table(void __iomem *txt, u32 type, +u32 bytes) +{ + void *heap; + u64 base, size, offset = 0; + int i; + + if (type > TXT_SINIT_TABLE_MAX) + slaunch_txt_reset(txt, + "Error invalid table type for early heap walk\n", + SL_ERROR_HEAP_WALK); + + memcpy_fromio(, txt + TXT_CR_HEAP_BASE, sizeof(base)); + memcpy_fromio(, txt + TXT_CR_HEAP_SIZE, sizeof(size)); + + /* Iterate over heap tables looking for table of "type" */ + for (i = 0; i < type; i++) { + base += offset; + heap = early_memremap(base, sizeof(u64)); + if (!heap) + slaunch_txt_reset(txt, + "Error early_memremap of heap for heap walk\n", + SL_ERROR_HEAP_MAP); + + offset = *((u64 *)heap); + + /* +* After the
[PATCH v3 04/14] Documentation/x86: Secure Launch kernel documentation
Introduce background, overview and configuration/ABI information for the Secure Launch kernel feature. Signed-off-by: Daniel P. Smith Signed-off-by: Ross Philipson --- Documentation/x86/index.rst | 1 + Documentation/x86/secure-launch.rst | 714 2 files changed, 715 insertions(+) create mode 100644 Documentation/x86/secure-launch.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 3830483..e5a058f 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -31,6 +31,7 @@ x86-specific Documentation tsx_async_abort buslock usb-legacy-support + secure-launch i386/index x86_64/index sva diff --git a/Documentation/x86/secure-launch.rst b/Documentation/x86/secure-launch.rst new file mode 100644 index 000..cc5995c6 --- /dev/null +++ b/Documentation/x86/secure-launch.rst @@ -0,0 +1,714 @@ +.. SPDX-License-Identifier: GPL-2.0 + += +Secure Launch += + +Background +== + +The Trusted Computing Group (TCG) architecture defines two methods in +which the target operating system is started, aka launched, on a system +for which Intel and AMD provides implementations. These two launch types +are static launch and dynamic launch. Static launch is referred to as +such because it happens at one fixed point, at system startup, during +the defined life-cycle of a system. Dynamic launch is referred to as +such because it is not limited to being done once nor bound to system +startup. It can in fact happen at anytime without incurring/requiring an +associated power event for the system. Since dynamic launch can happen +at anytime, this results in dynamic launch being split into two type of +its own. The first is referred to as an early launch, where the dynamic +launch is done in conjunction with the static lunch of the system. The +second is referred to as a late launch, where a dynamic launch is +initiated after the static launch was fully completed and the system was +under the control of some target operating system or run-time kernel. +These two top-level launch methods, static launch and dynamic launch +provide different models for establishing the launch integrity, i.e. the +load-time integrity, of the target operating system. When cryptographic +hashing is used to create an integrity assessment for the launch +integrity, then for a static launch this is referred to as the Static +Root of Trust for Measurement (SRTM) and for dynamic launch it is +referred to as the Dynamic Root of Trust for Measurement (DRTM). + +The reasoning for needing the two integrity models is driven by the fact +that these models leverage what is referred to as a "transitive trust". +A transitive trust is commonly referred to as a "trust chain", which is +created through the process of an entity making an integrity assessment +of another entity and upon success transfers control to the new entity. +This process is then repeated by each entity until the Trusted Computing +Base (TCB) of system has been established. A challenge for transitive +trust is that the process is susceptible to cumulative error +and in this case that error is inaccurate or improper integrity +assessments. The way to address cumulative error is to reduce the +number of instances that can create error involved in the process. In +this case it would be to reduce then number of entities involved in the +transitive trust. It is not possible to reduce the number of firmware +components or the boot loader(s) involved during static launch. This is +where dynamic launch comes in, as it introduces the concept for a CPU to +provide an instruction that results in a transitive trust starting with +the CPU doing an integrity assessment of a special loader that can then +start a target operating system. This reduces the trust chain down to +the CPU, a special loader, and the target operation system. It is also +why it is said that the DRTM is rooted in hardware since the CPU is what +does the first integrity assessment, i.e. the first measurement, in the +trust chain. + +Overview + + +Prior to the start of the TrenchBoot project, the only active Open +Source project supporting dynamic launch was Intel's Tboot project to +support their implementation of dynamic launch known as Intel Trusted +eXecution Technology (TXT). The approach taken by Tboot was to provide +an exokernel that could handle the launch protocol implemented by +Intel's special loader, the SINIT Authenticated Code Module (ACM [3]_) +and remained in memory to manage the SMX CPU mode that a dynamic launch +would put a system. While it is not precluded from being used for doing +a late launch, Tboot's primary use case was to be used as an early +launch solution. As a result the TrenchBoot project started the +development of Secure Launch kernel feature to provide a more +generalized approach. The focus of the effort is twofold, the first is to +make the Linux
[PATCH v3 14/14] tpm: Allow locality 2 to be set when initializing the TPM for Secure Launch
The Secure Launch MLE environment uses PCRs that are only accessible from the DRTM locality 2. By default the TPM drivers always initialize the locality to 0. When a Secure Launch is in progress, initialize the locality to 2. Signed-off-by: Ross Philipson --- drivers/char/tpm/tpm-chip.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index ddaeceb..48b9351 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "tpm.h" DEFINE_IDR(dev_nums_idr); @@ -34,12 +35,20 @@ static int tpm_request_locality(struct tpm_chip *chip) { - int rc; + int rc, locality; if (!chip->ops->request_locality) return 0; - rc = chip->ops->request_locality(chip, 0); + if (slaunch_get_flags() & SL_FLAG_ACTIVE) { + dev_dbg(>dev, "setting TPM locality to 2 for MLE\n"); + locality = 2; + } else { + dev_dbg(>dev, "setting TPM locality to 0\n"); + locality = 0; + } + + rc = chip->ops->request_locality(chip, locality); if (rc < 0) return rc; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 11/14] kexec: Secure Launch kexec SEXIT support
Prior to running the next kernel via kexec, the Secure Launch code closes down private SMX resources and does an SEXIT. This allows the next kernel to start normally without any issues starting the APs etc. Signed-off-by: Ross Philipson --- arch/x86/kernel/slaunch.c | 71 +++ kernel/kexec_core.c | 4 +++ 2 files changed, 75 insertions(+) diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c index f91f0b5..60a193a 100644 --- a/arch/x86/kernel/slaunch.c +++ b/arch/x86/kernel/slaunch.c @@ -458,3 +458,74 @@ void __init slaunch_setup_txt(void) pr_info("Intel TXT setup complete\n"); } + +static inline void smx_getsec_sexit(void) +{ + asm volatile (".byte 0x0f,0x37\n" + : : "a" (SMX_X86_GETSEC_SEXIT)); +} + +void slaunch_finalize(int do_sexit) +{ + void __iomem *config; + u64 one = TXT_REGVALUE_ONE, val; + + if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) != + (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) + return; + + config = ioremap(TXT_PRIV_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES * +PAGE_SIZE); + if (!config) { + pr_emerg("Error SEXIT failed to ioremap TXT private reqs\n"); + return; + } + + /* Clear secrets bit for SEXIT */ + memcpy_toio(config + TXT_CR_CMD_NO_SECRETS, , sizeof(one)); + memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val)); + + /* Unlock memory configurations */ + memcpy_toio(config + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(one)); + memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val)); + + /* Close the TXT private register space */ + memcpy_toio(config + TXT_CR_CMD_CLOSE_PRIVATE, , sizeof(one)); + memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val)); + + /* +* Calls to iounmap are not being done because of the state of the +* system this late in the kexec process. Local IRQs are disabled and +* iounmap causes a TLB flush which in turn causes a warning. Leaving +* thse mappings is not an issue since the next kernel is going to +* completely re-setup memory management. +*/ + + /* Map public registers and do a final read fence */ + config = ioremap(TXT_PUB_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES * +PAGE_SIZE); + if (!config) { + pr_emerg("Error SEXIT failed to ioremap TXT public reqs\n"); + return; + } + + memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val)); + + pr_emerg("TXT clear secrets bit and unlock memory complete."); + + if (!do_sexit) + return; + + if (smp_processor_id() != 0) { + pr_emerg("Error TXT SEXIT must be called on CPU 0\n"); + return; + } + + /* Disable SMX mode */ + cr4_set_bits(X86_CR4_SMXE); + + /* Do the SEXIT SMX operation */ + smx_getsec_sexit(); + + pr_emerg("TXT SEXIT complete."); +} diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 4b34a9a..fdf0a27 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -1179,6 +1180,9 @@ int kernel_kexec(void) cpu_hotplug_enable(); pr_notice("Starting new kernel\n"); machine_shutdown(); + + /* Finalize TXT registers and do SEXIT */ + slaunch_finalize(1); } kmsg_dump(KMSG_DUMP_SHUTDOWN); -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 05/14] x86: Secure Launch Kconfig
Initial bits to bring in Secure Launch functionality. Add Kconfig options for compiling in/out the Secure Launch code. Signed-off-by: Ross Philipson --- arch/x86/Kconfig | 32 1 file changed, 32 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 88fb922..b5e25c5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1949,6 +1949,38 @@ config EFI_MIXED If unsure, say N. +config SECURE_LAUNCH + bool "Secure Launch support" + default n + depends on X86_64 && X86_X2APIC + help + The Secure Launch feature allows a kernel to be loaded + directly through an Intel TXT measured launch. Intel TXT + establishes a Dynamic Root of Trust for Measurement (DRTM) + where the CPU measures the kernel image. This feature then + continues the measurement chain over kernel configuration + information and init images. + +config SECURE_LAUNCH_ALT_PCR19 + bool "Secure Launch Alternate PCR 19 usage" + default n + depends on SECURE_LAUNCH + help + In the post ACM environment, Secure Launch by default measures + configuration information into PCR18. This feature allows finer + control over measurements by moving configuration measurements + into PCR19. + +config SECURE_LAUNCH_ALT_PCR20 + bool "Secure Launch Alternate PCR 20 usage" + default n + depends on SECURE_LAUNCH + help + In the post ACM environment, Secure Launch by default measures + image data like any external initrd into PCR17. This feature + allows finer control over measurements by moving image measurements + into PCR20. + source "kernel/Kconfig.hz" config KEXEC -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 13/14] x86: Secure Launch late initcall platform module
From: "Daniel P. Smith" The Secure Launch platform module is a late init module. During the init call, the TPM event log is read and measurements taken in the early boot stub code are located. These measurements are extended into the TPM PCRs using the mainline TPM kernel driver. The platform module also registers the securityfs nodes to allow access to TXT register fields on Intel along with the fetching of and writing events to the late launch TPM log. Signed-off-by: Daniel P. Smith Signed-off-by: garnetgrimm Signed-off-by: Ross Philipson --- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/slmodule.c | 495 + 2 files changed, 496 insertions(+) create mode 100644 arch/x86/kernel/slmodule.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index d6ee904..09b730a 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -81,6 +81,7 @@ obj-$(CONFIG_IA32_EMULATION) += tls.o obj-y += step.o obj-$(CONFIG_INTEL_TXT)+= tboot.o obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o +obj-$(CONFIG_SECURE_LAUNCH)+= slmodule.o obj-$(CONFIG_ISA_DMA_API) += i8237.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += cpu/ diff --git a/arch/x86/kernel/slmodule.c b/arch/x86/kernel/slmodule.c new file mode 100644 index 000..807f9ca --- /dev/null +++ b/arch/x86/kernel/slmodule.c @@ -0,0 +1,495 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Secure Launch late validation/setup, securityfs exposure and + * finalization support. + * + * Copyright (c) 2021 Apertus Solutions, LLC + * Copyright (c) 2021 Assured Information Security, Inc. + * Copyright (c) 2021, Oracle and/or its affiliates. + * + * Author(s): + * Daniel P. Smith + * Garnet T. Grimm + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SL_FS_ENTRIES 10 +/* root directory node must be last */ +#define SL_ROOT_DIR_ENTRY (SL_FS_ENTRIES - 1) +#define SL_TXT_DIR_ENTRY (SL_FS_ENTRIES - 2) +#define SL_TXT_FILE_FIRST (SL_TXT_DIR_ENTRY - 1) +#define SL_TXT_ENTRY_COUNT 7 + +#define DECLARE_TXT_PUB_READ_U(size, fmt, msg_size)\ +static ssize_t txt_pub_read_u##size(unsigned int offset, \ + loff_t *read_offset,\ + size_t read_len,\ + char __user *buf) \ +{ \ + void __iomem *txt; \ + char msg_buffer[msg_size]; \ + u##size reg_value = 0; \ + txt = ioremap(TXT_PUB_CONFIG_REGS_BASE, \ + TXT_NR_CONFIG_PAGES * PAGE_SIZE); \ + if (IS_ERR(txt))\ + return PTR_ERR(txt);\ + memcpy_fromio(_value, txt + offset, sizeof(u##size)); \ + iounmap(txt); \ + snprintf(msg_buffer, msg_size, fmt, reg_value); \ + return simple_read_from_buffer(buf, read_len, read_offset, \ + _buffer, msg_size); \ +} + +DECLARE_TXT_PUB_READ_U(8, "%#04x\n", 6); +DECLARE_TXT_PUB_READ_U(32, "%#010x\n", 12); +DECLARE_TXT_PUB_READ_U(64, "%#018llx\n", 20); + +#define DECLARE_TXT_FOPS(reg_name, reg_offset, reg_size) \ +static ssize_t txt_##reg_name##_read(struct file *flip, \ + char __user *buf, size_t read_len, loff_t *read_offset) \ +{ \ + return txt_pub_read_u##reg_size(reg_offset, read_offset,\ + read_len, buf); \ +} \ +static const struct file_operations reg_name##_ops = { \ + .read = txt_##reg_name##_read, \ +} + +DECLARE_TXT_FOPS(sts, TXT_CR_STS, 64); +DECLARE_TXT_FOPS(ests, TXT_CR_ESTS, 8); +DECLARE_TXT_FOPS(errorcode, TXT_CR_ERRORCODE, 32); +DECLARE_TXT_FOPS(didvid, TXT_CR_DIDVID, 64); +DECLARE_TXT_FOPS(e2sts, TXT_CR_E2STS, 64); +DECLARE_TXT_FOPS(ver_emif, TXT_CR_VER_EMIF, 32); +DECLARE_TXT_FOPS(scratchpad, TXT_CR_SCRATCHPAD, 64); + +/* + * Securityfs exposure + */ +struct memfile { + char *name; + void *addr; + size_t size; +}; + +static struct memfile sl_evtlog = {"eventlog", 0, 0}; +static void *txt_heap; +static struct
[PATCH v3 02/14] x86/boot: Add missing handling of setup_indirect structures
One of the two functions in ioremap.c that handles setup_data was missing the correct handling of setup_indirect structures. Functionality missing from original commit: commit b3c72fc9a78e (x86/boot: Introduce setup_indirect) Signed-off-by: Ross Philipson --- arch/x86/mm/ioremap.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index ab74e4f..f2b34c5 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -669,17 +669,34 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, paddr = boot_params.hdr.setup_data; while (paddr) { - unsigned int len; + unsigned int len, size; if (phys_addr == paddr) return true; data = early_memremap_decrypted(paddr, sizeof(*data)); + size = sizeof(*data); paddr_next = data->next; len = data->len; - early_memunmap(data, sizeof(*data)); + if ((phys_addr > paddr) && (phys_addr < (paddr + len))) { + early_memunmap(data, sizeof(*data)); + return true; + } + + if (data->type == SETUP_INDIRECT) { + size += len; + early_memunmap(data, sizeof(*data)); + data = early_memremap_decrypted(paddr, size); + + if (((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) { + paddr = ((struct setup_indirect *)data->data)->addr; + len = ((struct setup_indirect *)data->data)->len; + } + } + + early_memunmap(data, size); if ((phys_addr > paddr) && (phys_addr < (paddr + len))) return true; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 00/14] x86: Trenchboot secure dynamic launch Linux kernel support
The focus of Trechboot project (https://github.com/TrenchBoot) is to enhance the boot security and integrity. This requires the linux kernel to be directly invoked by x86 Dynamic launch measurements to establish Dynamic Root of Trust for Measurement (DRTM). The dynamic launch will be initiated by a boot loader with associated support added to it, for example the first targeted boot loader will be GRUB2. An integral part of establishing the DRTM involves measuring everything that is intended to be run (kernel image, initrd, etc) and everything that will configure that kernel to run (command line, boot params, etc) into specific PCRs, the DRTM PCRs (17-22), in the TPM. Another key aspect is the dynamic launch is rooted in hardware, that is to say the hardware (CPU) is what takes the first measurement for the chain of integrity measurements. On Intel this is done using the GETSEC instruction provided by Intel's TXT and the SKINIT instruction provided by AMD's AMD-V. Information on these technologies can be readily found online. This patchset introduces Intel TXT support. To enable the kernel to be launched by GETSEC, a stub must be built into the setup section of the compressed kernel to handle the specific state that the dynamic launch process leaves the BSP in. Also this stub must measure everything that is going to be used as early as possible. This stub code and subsequent code must also deal with the specific state that the dynamic launch leaves the APs in. A quick note on terminology. The larger open source project itself is called Trenchboot, which is hosted on Github (links below). The kernel feature enabling the use of the x86 technology is referred to as "Secure Launch" within the kernel code. As such the prefixes sl_/SL_ or slaunch/SLAUNCH will be seen in the code. The stub code discussed above is referred to as the SL stub. The new feature starts with patch #4. There are several preceeding patches before that. Patches 1 and 2 are fixes to an earlier patch set that itroduced the x86 setup_data type setup_indirect. Patch 3 was authored by Arvind Sankar. There is no further status on this patch at this point but Secure Launch depends on it so it is included with the set. The basic flow is: - Entry from the dynamic launch jumps to the SL stub - SL stub fixes up the world on the BSP - For TXT, SL stub wakes the APs, fixes up their worlds - For TXT, APs are left halted waiting for an NMI to wake them - SL stub jumps to startup_32 - SL main locates the TPM event log and writes the measurements of configuration and module information into it. - Kernel boot proceeds normally from this point. - During early setup, slaunch_setup() runs to finish some validation and setup tasks. - The SMP bringup code is modified to wake the waiting APs. APs vector to rmpiggy and start up normally from that point. - SL platform module is registered as a late initcall module. It reads the TPM event log and extends the measurements taken into the TPM PCRs. - SL platform module initializes the securityfs interface to allow asccess to the TPM event log and TXT public registers. - Kernel boot finishes booting normally - SEXIT support to leave SMX mode is present on the kexec path and the various reboot paths (poweroff, reset, halt). Links: The Trenchboot project including documentation: https://github.com/trenchboot Intel TXT is documented in its own specification and in the SDM Instruction Set volume: https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf https://software.intel.com/en-us/articles/intel-sdm AMD SKINIT is documented in the System Programming manual: https://www.amd.com/system/files/TechDocs/24593.pdf GRUB2 pre-launch support patchset (WIP): https://lists.gnu.org/archive/html/grub-devel/2020-05/msg00011.html Thanks Ross Philipson and Daniel P. Smith Changes in v2: - Modified 32b entry code to prevent causing relocations in the compressed kernel. - Dropped patches for compressed kernel TPM PCR extender. - Modified event log code to insert log delimiter events and not rely on TPM access. - Stop extending PCRs in the early Secure Launch stub code. - Removed Kconfig options for hash algorithms and use the algorithms the ACM used. - Match Secure Launch measurement algorithm use to those reported in the TPM 2.0 event log. - Read the TPM events out of the TPM and extend them into the PCRs using the mainline TPM driver. This is done in the late initcall module. - Allow use of alternate PCR 19 and 20 for post ACM measurements. - Add Kconfig constraints needed by Secure Launch (disable KASLR and add x2apic dependency). - Fix testing of SL_FLAGS when determining if Secure Launch is active and the architecture is TXT. - Use SYM_DATA_START_LOCAL macros in early entry point code. - Security audit changes: - Validate buffers passed to MLE do not overlap the MLE and are properly laid out.
[PATCH v3 10/14] x86: Secure Launch SMP bringup support
On Intel, the APs are left in a well documented state after TXT performs the late launch. Specifically they cannot have #INIT asserted on them so a standard startup via INIT/SIPI/SIPI cannot be performed. Instead the early SL stub code parked the APs in a pause/jmp loop waiting for an NMI. The modified SMP boot code is called for the Secure Launch case. The jump address for the RM piggy entry point is fixed up in the jump where the APs are waiting and an NMI IPI is sent to the AP. The AP vectors to the Secure Launch entry point in the RM piggy which mimics what the real mode code would do then jumps the the standard RM piggy protected mode entry point. Signed-off-by: Ross Philipson --- arch/x86/include/asm/realmode.h | 3 ++ arch/x86/kernel/smpboot.c| 86 arch/x86/realmode/rm/header.S| 3 ++ arch/x86/realmode/rm/trampoline_64.S | 37 4 files changed, 129 insertions(+) diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index 5db5d08..ef37bf1 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -37,6 +37,9 @@ struct real_mode_header { #ifdef CONFIG_X86_64 u32 machine_real_restart_seg; #endif +#ifdef CONFIG_SECURE_LAUNCH + u32 sl_trampoline_start32; +#endif }; /* This must match data at realmode/rm/trampoline_{32,64}.S */ diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 9320285..aafe627 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -1021,6 +1022,83 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle) return 0; } +#ifdef CONFIG_SECURE_LAUNCH + +static atomic_t first_ap_only = {1}; + +/* + * Called to fix the long jump address for the waiting APs to vector to + * the correct startup location in the Secure Launch stub in the rmpiggy. + */ +static int +slaunch_fixup_jump_vector(void) +{ + struct sl_ap_wake_info *ap_wake_info; + u32 *ap_jmp_ptr = NULL; + + if (!atomic_dec_and_test(_ap_only)) + return 0; + + ap_wake_info = slaunch_get_ap_wake_info(); + + ap_jmp_ptr = (u32 *)__va(ap_wake_info->ap_wake_block + +ap_wake_info->ap_jmp_offset); + + *ap_jmp_ptr = real_mode_header->sl_trampoline_start32; + + pr_info("TXT AP long jump address updated\n"); + + return 0; +} + +/* + * TXT AP startup is quite different than normal. The APs cannot have #INIT + * asserted on them or receive SIPIs. The early Secure Launch code has parked + * the APs in a pause loop waiting to receive an NMI. This will wake the APs + * and have them jump to the protected mode code in the rmpiggy where the rest + * of the SMP boot of the AP will proceed normally. + */ +static int +slaunch_wakeup_cpu_from_txt(int cpu, int apicid) +{ + unsigned long send_status = 0, accept_status = 0; + + /* Only done once */ + if (slaunch_fixup_jump_vector()) + return -1; + + /* Send NMI IPI to idling AP and wake it up */ + apic_icr_write(APIC_DM_NMI, apicid); + + if (init_udelay == 0) + udelay(10); + else + udelay(300); + + send_status = safe_apic_wait_icr_idle(); + + if (init_udelay == 0) + udelay(10); + else + udelay(300); + + accept_status = (apic_read(APIC_ESR) & 0xEF); + + if (send_status) + pr_err("Secure Launch IPI never delivered???\n"); + if (accept_status) + pr_err("Secure Launch IPI delivery error (%lx)\n", + accept_status); + + return (send_status | accept_status); +} + +#else + +#define slaunch_wakeup_cpu_from_txt(cpu, apicid) 0 + +#endif /* !CONFIG_SECURE_LAUNCH */ + /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -1075,6 +1153,13 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle, cpumask_clear_cpu(cpu, cpu_initialized_mask); smp_mb(); + /* With Intel TXT, the AP startup is totally different */ + if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) == + (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) { + boot_error = slaunch_wakeup_cpu_from_txt(cpu, apicid); + goto txt_wake; + } + /* * Wake up a CPU in difference cases: * - Use the method in the APIC driver if it's defined @@ -1087,6 +1172,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle, boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid, cpu0_nmi_registered); +txt_wake: if (!boot_error) { /* * Wait 10s total for
[PATCH v3 03/14] x86/boot: Place kernel_info at a fixed offset
From: Arvind Sankar There are use cases for storing the offset of a symbol in kernel_info. For example, the trenchboot series [0] needs to store the offset of the Measured Launch Environment header in kernel_info. Since commit (note: commit ID from tip/master) 527afc212231 ("x86/boot: Check that there are no run-time relocations") run-time relocations are not allowed in the compressed kernel, so simply using the symbol in kernel_info, as .long symbol will cause a linker error because this is not position-independent. With kernel_info being a separate object file and in a different section from startup_32, there is no way to calculate the offset of a symbol from the start of the image in a position-independent way. To enable such use cases, put kernel_info into its own section which is placed at a predetermined offset (KERNEL_INFO_OFFSET) via the linker script. This will allow calculating the symbol offset in a position-independent way, by adding the offset from the start of kernel_info to KERNEL_INFO_OFFSET. Ensure that kernel_info is aligned, and use the SYM_DATA.* macros instead of bare labels. This stores the size of the kernel_info structure in the ELF symbol table. Signed-off-by: Arvind Sankar Cc: Ross Philipson Signed-off-by: Ross Philipson --- arch/x86/boot/compressed/kernel_info.S | 19 +++ arch/x86/boot/compressed/kernel_info.h | 12 arch/x86/boot/compressed/vmlinux.lds.S | 6 ++ 3 files changed, 33 insertions(+), 4 deletions(-) create mode 100644 arch/x86/boot/compressed/kernel_info.h diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S index f818ee8..c18f071 100644 --- a/arch/x86/boot/compressed/kernel_info.S +++ b/arch/x86/boot/compressed/kernel_info.S @@ -1,12 +1,23 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#include #include +#include "kernel_info.h" - .section ".rodata.kernel_info", "a" +/* + * If a field needs to hold the offset of a symbol from the start + * of the image, use the macro below, eg + * .long rva(symbol) + * This will avoid creating run-time relocations, which are not + * allowed in the compressed kernel. + */ + +#define rva(X) (((X) - kernel_info) + KERNEL_INFO_OFFSET) - .global kernel_info + .section ".rodata.kernel_info", "a" -kernel_info: + .balign 16 +SYM_DATA_START(kernel_info) /* Header, Linux top (structure). */ .ascii "LToP" /* Size. */ @@ -19,4 +30,4 @@ kernel_info: kernel_info_var_len_data: /* Empty for time being... */ -kernel_info_end: +SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end) diff --git a/arch/x86/boot/compressed/kernel_info.h b/arch/x86/boot/compressed/kernel_info.h new file mode 100644 index 000..c127f84 --- /dev/null +++ b/arch/x86/boot/compressed/kernel_info.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef BOOT_COMPRESSED_KERNEL_INFO_H +#define BOOT_COMPRESSED_KERNEL_INFO_H + +#ifdef CONFIG_X86_64 +#define KERNEL_INFO_OFFSET 0x500 +#else /* 32-bit */ +#define KERNEL_INFO_OFFSET 0x100 +#endif + +#endif /* BOOT_COMPRESSED_KERNEL_INFO_H */ diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S index 112b237..84c7b4d 100644 --- a/arch/x86/boot/compressed/vmlinux.lds.S +++ b/arch/x86/boot/compressed/vmlinux.lds.S @@ -7,6 +7,7 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT) #include #include +#include "kernel_info.h" #ifdef CONFIG_X86_64 OUTPUT_ARCH(i386:x86-64) @@ -27,6 +28,11 @@ SECTIONS HEAD_TEXT _ehead = . ; } + .rodata.kernel_info KERNEL_INFO_OFFSET : { + *(.rodata.kernel_info) + } + ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info at bad address!") + .rodata..compressed : { *(.rodata..compressed) } -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 01/14] x86/boot: Fix memremap of setup_indirect structures
As documented, the setup_indirect structure is nested inside the setup_data structures in the setup_data list. The code was accessing the fields inside the setup_indirect structure but only the sizeof(struct setup_data) was being memremapped. No crash occured but this is just due to how the area was remapped under the covers. The setup_indirect structure was introduced in commit: commit b3c72fc9a78e (x86/boot: Introduce setup_indirect) Signed-off-by: Ross Philipson --- arch/x86/kernel/e820.c | 31 - arch/x86/kernel/kdebugfs.c | 28 --- arch/x86/kernel/ksysfs.c | 56 -- arch/x86/kernel/setup.c| 23 +-- arch/x86/mm/ioremap.c | 13 +++ 5 files changed, 109 insertions(+), 42 deletions(-) diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index bc0657f..e023950 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -996,7 +996,8 @@ static int __init parse_memmap_opt(char *str) void __init e820__reserve_setup_data(void) { struct setup_data *data; - u64 pa_data; + u64 pa_data, pa_next; + u32 len; pa_data = boot_params.hdr.setup_data; if (!pa_data) @@ -1004,6 +1005,9 @@ void __init e820__reserve_setup_data(void) while (pa_data) { data = early_memremap(pa_data, sizeof(*data)); + len = sizeof(*data); + pa_next = data->next; + e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); /* @@ -1015,18 +1019,23 @@ void __init e820__reserve_setup_data(void) sizeof(*data) + data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); - if (data->type == SETUP_INDIRECT && - ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) { - e820__range_update(((struct setup_indirect *)data->data)->addr, - ((struct setup_indirect *)data->data)->len, - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); - e820__range_update_kexec(((struct setup_indirect *)data->data)->addr, -((struct setup_indirect *)data->data)->len, -E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); + if (data->type == SETUP_INDIRECT) { + len += data->len; + early_memunmap(data, sizeof(*data)); + data = early_memremap(pa_data, len); + + if (((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) { + e820__range_update(((struct setup_indirect *)data->data)->addr, + ((struct setup_indirect *)data->data)->len, + E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); + e820__range_update_kexec(((struct setup_indirect *)data->data)->addr, +((struct setup_indirect *)data->data)->len, +E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); + } } - pa_data = data->next; - early_memunmap(data, sizeof(*data)); + pa_data = pa_next; + early_memunmap(data, len); } e820__update_table(e820_table); diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c index 64b6da9..2fa1b93 100644 --- a/arch/x86/kernel/kdebugfs.c +++ b/arch/x86/kernel/kdebugfs.c @@ -92,7 +92,8 @@ static int __init create_setup_data_nodes(struct dentry *parent) struct setup_data *data; int error; struct dentry *d; - u64 pa_data; + u64 pa_data, pa_next; + u32 len; int no = 0; d = debugfs_create_dir("setup_data", parent); @@ -112,12 +113,23 @@ static int __init create_setup_data_nodes(struct dentry *parent) error = -ENOMEM; goto err_dir; } - - if (data->type == SETUP_INDIRECT && - ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) { - node->paddr = ((struct setup_indirect *)data->data)->addr; - node->type = ((struct setup_indirect *)data->data)->type; - node->len = ((struct setup_indirect *)data->data)->len; + pa_next = data->next; + + if (data->type == SETUP_INDIRECT) { + len = sizeof(*data) + data->len; + memunmap(data); + data = memremap(pa_data, len,
Re: [PATCH resend] dma-debug: Use memory_intersects() directly
Thanks, applied. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-debug: fix debugfs initialization order
Thanks, applied. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote: > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: > > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: > > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote: > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY > > > > > > > flag") > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it > > > > > > > went > > > > > > > the memory type setting required for the non-coherent masters to > > > > > > > use > > > > > > > system cache. Now that system cache support for GPU is added, we > > > > > > > will > > > > > > > need to set the right PTE attribute for GPU buffers to be sys > > > > > > > cached. > > > > > > > Without this, the system cache lines are not allocated for GPU. > > > > > > > > > > > > > > So the patches in this series introduces a new prot flag > > > > > > > IOMMU_LLC, > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC > > > > > > > and makes GPU the user of this protection flag. > > > > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it > > > > > > does > > > > > > not apply anymore? > > > > > > > > > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, > > > > > then > > > > > I can repost the patch. > > > > > > > > I still think you need to handle the mismatched alias, no? You're adding > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That > > > > can't be right. > > > > > > > > > > Just curious, and maybe this is a dumb question, but what is your > > > concern about mismatched aliases? I mean the cache hierarchy on the > > > GPU device side (anything beyond the LLC) is pretty different and > > > doesn't really care about the smmu pgtable attributes.. > > > > If the CPU accesses a shared buffer with different attributes to those which > > the device is using then you fall into the "mismatched memory attributes" > > part of the Arm architecture. It's reasonably unforgiving (you should go and > > read it) and in some cases can apply to speculative accesses as well, but > > the end result is typically loss of coherency. > > Ok, I might have a few other sections to read first to decipher the > terminology.. > > But my understanding of LLC is that it looks just like system memory > to the CPU and GPU (I think that would make it "the point of > coherence" between the GPU and CPU?) If that is true, shouldn't it be > invisible from the point of view of different CPU mapping options? You could certainly build a system where mismatched attributes don't cause loss of coherence, but as it's not guaranteed by the architecture and the changes proposed here affect APIs which are exposed across SoCs, then I don't think it helps much. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch V2 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation
On Mon, Aug 02, 2021 at 04:46:37PM +0100, Robin Murphy wrote: > On 2021-08-02 16:16, Will Deacon wrote: > > On Fri, Jun 18, 2021 at 02:00:35AM +0530, Ashish Mhetre wrote: > > > Multiple iommu domains and iommu groups are getting created for the > > > devices > > > sharing same SID. It is expected for devices sharing same SID to be in > > > same > > > iommu group and same iommu domain. > > > This is leading to context faults when one device is accessing IOVA from > > > other device which shouldn't be the case for devices sharing same SID. > > > Fix this by protecting iommu domain and iommu group creation with mutexes. > > > > Robin -- any chance you could take a look at these, please? You had some > > comments on the first version which convinced me that they are needed, > > but I couldn't tell whether you wanted to solve this a different way or not. > > Sorry, I was lamenting that this came to light due to the > of_iommu_configure() flow being yucky, but that wasn't meant to imply that > there aren't - or couldn't be in future - better reasons for > iommu_probe_device() to be robust against concurrency anyway. I do think > these are legitimate fixes to make in their own right, even if the current > need might get swept back under the rug in future. > > I would say, however, that the commit messages seem to focus too much on the > wrong details and aren't overly useful, and patch #2 is missing Ashish's > sign-off. Ashish -- please can you send a v3 fixing these issues? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init
On 2021-08-09 13:52, Will Deacon wrote: On Wed, Aug 04, 2021 at 06:15:52PM +0100, Robin Murphy wrote: Factor out flush queue setup from the initial domain init so that we can potentially trigger it from sysfs later on in a domain's lifetime. Reviewed-by: Lu Baolu Reviewed-by: John Garry Signed-off-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 30 -- include/linux/dma-iommu.h | 9 ++--- 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 2e19505dddf9..f51b8dc99ac6 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev) return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; } +int iommu_dma_init_fq(struct iommu_domain *domain) +{ + struct iommu_dma_cookie *cookie = domain->iova_cookie; + + if (domain->type != IOMMU_DOMAIN_DMA_FQ) + return -EINVAL; + if (cookie->fq_domain) + return 0; + + if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all, + iommu_dma_entry_dtor)) { + pr_warn("iova flush queue initialization failed\n"); + domain->type = IOMMU_DOMAIN_DMA; + return -ENODEV; I do find this a bit odd: we assert that the caller has set domain->type to IOMMU_DOMAIN_DMA_FQ but then on failure we reset it to IOMMU_DOMAIN_DMA here. I think it would be less error-prone if the setting of domain->type was handled in the same function. On reflection I think I agree. For some reason I settled on the idea of doing this to make the callers simpler, but it turns out that unpicking it to flow logically is in fact a +4/-5 diff essentially just moving all the same statements to different places, and that's before I update comments since that theoretical race between the sysfs and DMA ops paths only exists because of sysfs having to dance around the type check here... I'll send v4 later today or possibly tomorrow, but not in such a hurry that I skimp on the build-testing this time! Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 25/25] iommu: Allow enabling non-strict mode dynamically
On 2021-08-09 13:49, Will Deacon wrote: On Wed, Aug 04, 2021 at 06:15:53PM +0100, Robin Murphy wrote: Allocating and enabling a flush queue is in fact something we can reasonably do while a DMA domain is active, without having to rebuild it from scratch. Thus we can allow a strict -> non-strict transition from sysfs without requiring to unbind the device's driver, which is of particular interest to users who want to make selective relaxations to critical devices like the one serving their root filesystem. Disabling and draining a queue also seems technically possible to achieve without rebuilding the whole domain, but would certainly be more involved. Furthermore there's not such a clear use-case for tightening up security *after* the device may already have done whatever it is that you don't trust it not to do, so we only consider the relaxation case. CC: Sai Praneeth Prakhya Signed-off-by: Robin Murphy --- v3: Actually think about concurrency, rework most of the fq data accesses to be (hopefully) safe and comment it all --- drivers/iommu/dma-iommu.c | 25 ++--- drivers/iommu/iommu.c | 16 drivers/iommu/iova.c | 9 ++--- 3 files changed, 36 insertions(+), 14 deletions(-) I failed to break this, so hopefully you've caught everything now. Only thing I wasn't sure of is why we still need the smp_wmb() in init_iova_flush_queue(). Can we remove it now that we have one before assigning into the cookie? Mostly because I failed to spot it, I think :) Indeed now that we don't have any callers other than iommu_dma_init_fq() to worry about, I don't think that one matters any more. It would if were testing cookie->iovad->fq directly as our indicator instead of cookie->fq_domain, but then we'd still need the new barrier to ensure iommu_dma_flush_iotlb_all() properly observes the latter, so we may as well rely on that everywhere and let it fully replace the old one. Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init
On Wed, Aug 04, 2021 at 06:15:52PM +0100, Robin Murphy wrote: > Factor out flush queue setup from the initial domain init so that we > can potentially trigger it from sysfs later on in a domain's lifetime. > > Reviewed-by: Lu Baolu > Reviewed-by: John Garry > Signed-off-by: Robin Murphy > --- > drivers/iommu/dma-iommu.c | 30 -- > include/linux/dma-iommu.h | 9 ++--- > 2 files changed, 26 insertions(+), 13 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 2e19505dddf9..f51b8dc99ac6 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev) > return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; > } > > +int iommu_dma_init_fq(struct iommu_domain *domain) > +{ > + struct iommu_dma_cookie *cookie = domain->iova_cookie; > + > + if (domain->type != IOMMU_DOMAIN_DMA_FQ) > + return -EINVAL; > + if (cookie->fq_domain) > + return 0; > + > + if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all, > + iommu_dma_entry_dtor)) { > + pr_warn("iova flush queue initialization failed\n"); > + domain->type = IOMMU_DOMAIN_DMA; > + return -ENODEV; I do find this a bit odd: we assert that the caller has set domain->type to IOMMU_DOMAIN_DMA_FQ but then on failure we reset it to IOMMU_DOMAIN_DMA here. I think it would be less error-prone if the setting of domain->type was handled in the same function. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 25/25] iommu: Allow enabling non-strict mode dynamically
On Wed, Aug 04, 2021 at 06:15:53PM +0100, Robin Murphy wrote: > Allocating and enabling a flush queue is in fact something we can > reasonably do while a DMA domain is active, without having to rebuild it > from scratch. Thus we can allow a strict -> non-strict transition from > sysfs without requiring to unbind the device's driver, which is of > particular interest to users who want to make selective relaxations to > critical devices like the one serving their root filesystem. > > Disabling and draining a queue also seems technically possible to > achieve without rebuilding the whole domain, but would certainly be more > involved. Furthermore there's not such a clear use-case for tightening > up security *after* the device may already have done whatever it is that > you don't trust it not to do, so we only consider the relaxation case. > > CC: Sai Praneeth Prakhya > Signed-off-by: Robin Murphy > > --- > > v3: Actually think about concurrency, rework most of the fq data > accesses to be (hopefully) safe and comment it all > --- > drivers/iommu/dma-iommu.c | 25 ++--- > drivers/iommu/iommu.c | 16 > drivers/iommu/iova.c | 9 ++--- > 3 files changed, 36 insertions(+), 14 deletions(-) I failed to break this, so hopefully you've caught everything now. Only thing I wasn't sure of is why we still need the smp_wmb() in init_iova_flush_queue(). Can we remove it now that we have one before assigning into the cookie? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 0/3] Apple M1 DART IOMMU driver
On Tue, Aug 03, 2021 at 02:16:48PM +0200, Sven Peter wrote: > Sven Peter (3): > iommu/io-pgtable: Add DART pagetable format > dt-bindings: iommu: add DART iommu bindings > iommu/dart: Add DART iommu driver Applied, thanks. This driver now lives in the apple/dart branch of the IOMMU tree. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/12] Clean up "mediatek,larb"
On Mon, Aug 09, 2021 at 08:30:03AM +, Yong Wu (吴勇) wrote: > Thanks very much for your confirm. I will your Ack for iommu part in > the next version. Note that my ack is conditional on the premise that Matthias has reviewed the IOMMU parts. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC v2] /dev/iommu uAPI proposal
> From: David Gibson > Sent: Friday, August 6, 2021 12:45 PM > > > > In concept I feel the purpose of DMA endpoint is equivalent to the > routing > > > > info in this proposal. > > > > > > Maybe? I'm afraid I never quite managed to understand the role of the > > > routing info in your proposal. > > > > > > > the IOMMU routes incoming DMA packets to a specific I/O page table, > > according to RID or RID+PASID carried in the packet. RID or RID+PASID > > is the routing information (represented by device cookie +PASID in > > proposed uAPI) and what the iommu driver really cares when activating > > the I/O page table in the iommu. > > Ok, so yes, endpoint is roughly equivalent to that. But my point is > that the IOMMU layer really only cares about that (device+routing) > combination, not other aspects of what the device is. So that's the > concept we should give a name and put front and center in the API. > This is how this proposal works, centered around the routing info. the uAPI doesn't care what the device is. It just requires the user to specify the user view of routing info (device fd + optional pasid) to tag an IOAS. the user view is then converted to the kernel view of routing (rid or rid+pasid) by vfio driver and then passed to iommu fd in the attaching operation. and GET_INFO interface is provided for the user to check whether a device supports multiple IOASes and whether pasid space is delegated to the user. We just need a better name if pasid is considered too pci specific... But creating an endpoint per ioasid and making it centered in uAPI is not what the IOMMU layer cares about. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu