Re: mmu.c:undefined reference to `patch__hash_page_A0'
HI-- I no longer see this build error. However: On 2/27/21 2:24 AM, kernel test robot wrote: > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > master > head: 3fb6d0e00efc958d01c2f109c8453033a2d96796 > commit: 259149cf7c3c6195e6199e045ca988c31d081cab powerpc/32s: Only build hash > code when CONFIG_PPC_BOOK3S_604 is selected > date: 4 weeks ago > config: powerpc64-randconfig-r013-20210227 (attached as .config) ktr/lkp, this is a PPC32 .config file that is attached, not PPC64. Also: > compiler: powerpc-linux-gcc (GCC) 9.3.0 > reproduce (this is a W=1 build): > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=259149cf7c3c6195e6199e045ca988c31d081cab > git remote add linus > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > git fetch --no-tags linus master > git checkout 259149cf7c3c6195e6199e045ca988c31d081cab > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross > ARCH=powerpc64 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot > > All errors (new ones prefixed by >>): > >powerpc-linux-ld: arch/powerpc/mm/book3s32/mmu.o: in function > `MMU_init_hw_patch': >>> mmu.c:(.init.text+0x75e): undefined reference to `patch__hash_page_A0' >>> powerpc-linux-ld: mmu.c:(.init.text+0x76a): undefined reference to >>> `patch__hash_page_A0' >>> powerpc-linux-ld: mmu.c:(.init.text+0x776): undefined reference to >>> `patch__hash_page_A1' >powerpc-linux-ld: mmu.c:(.init.text+0x782): undefined reference to > `patch__hash_page_A1' >>> powerpc-linux-ld: mmu.c:(.init.text+0x78e): undefined reference to >>> `patch__hash_page_A2' >powerpc-linux-ld: mmu.c:(.init.text+0x79a): undefined reference to > `patch__hash_page_A2' >>> powerpc-linux-ld: mmu.c:(.init.text+0x7aa): undefined reference to >>> `patch__hash_page_B' >powerpc-linux-ld: mmu.c:(.init.text+0x7b6): undefined reference to > `patch__hash_page_B' >>> powerpc-linux-ld: mmu.c:(.init.text+0x7c2): undefined reference to >>> `patch__hash_page_C' >powerpc-linux-ld: mmu.c:(.init.text+0x7ce): undefined reference to > `patch__hash_page_C' >>> powerpc-linux-ld: mmu.c:(.init.text+0x7da): undefined reference to >>> `patch__flush_hash_A0' >powerpc-linux-ld: mmu.c:(.init.text+0x7e6): undefined reference to > `patch__flush_hash_A0' >>> powerpc-linux-ld: mmu.c:(.init.text+0x7f2): undefined reference to >>> `patch__flush_hash_A1' >powerpc-linux-ld: mmu.c:(.init.text+0x7fe): undefined reference to > `patch__flush_hash_A1' >>> powerpc-linux-ld: mmu.c:(.init.text+0x80a): undefined reference to >>> `patch__flush_hash_A2' >powerpc-linux-ld: mmu.c:(.init.text+0x816): undefined reference to > `patch__flush_hash_A2' >>> powerpc-linux-ld: mmu.c:(.init.text+0x83e): undefined reference to >>> `patch__flush_hash_B' >powerpc-linux-ld: mmu.c:(.init.text+0x84e): undefined reference to > `patch__flush_hash_B' >powerpc-linux-ld: arch/powerpc/mm/book3s32/mmu.o: in function > `update_mmu_cache': >>> mmu.c:(.text.update_mmu_cache+0xa0): undefined reference to `add_hash_page' I do see this build error: powerpc-linux-ld: arch/powerpc/boot/wrapper.a(decompress.o): in function `partial_decompress': decompress.c:(.text+0x1f0): undefined reference to `__decompress' when either CONFIG_KERNEL_LZO=y or CONFIG_KERNEL_LZMA=y but the build succeeds when either CONFIG_KERNEL_GZIP=y or CONFIG_KERNEL_XZ=y I guess that is due to arch/powerpc/boot/decompress.c doing this: #ifdef CONFIG_KERNEL_GZIP # include "decompress_inflate.c" #endif #ifdef CONFIG_KERNEL_XZ # include "xz_config.h" # include "../../../lib/decompress_unxz.c" #endif It would be nice to require one of KERNEL_GZIP or KERNEL_XZ to be set/enabled (maybe unless a uImage is being built?). ta. -- ~Randy
Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
On Sat, Apr 17, 2021 at 09:18:57PM +, David Laight wrote: > Ugly as well. Thank you for expressing your opinion. Again.
Re: [PATCH 2/2] mm: Indicate pfmemalloc pages in compound_head
On Sat, Apr 17, 2021 at 09:13:45PM +, David Laight wrote: > > struct {/* page_pool used by netstack */ > > - /** > > -* @dma_addr: might require a 64-bit value on > > -* 32-bit architectures. > > -*/ > > + unsigned long pp_magic; > > + unsigned long xmi; > > + unsigned long _pp_mapping_pad; > > unsigned long dma_addr[2]; > > }; > > You've deleted the comment. Yes. It no longer added any value. You can see dma_addr now occupies two words. > I also think there should be a comment that dma_addr[0] > must be aliased to ->index. That's not a requirement. Moving the pfmemalloc indicator is a requirement so that we _can_ use index, but there's no requirement about how index is used.
Re: swiotlb cleanups v3
On 4/17/21 11:39 AM, Tom Lendacky wrote: >> Hi Konrad, >> >> this series contains a bunch of swiotlb cleanups, mostly to reduce the >> amount of internals exposed to code outside of swiotlb.c, which should >> helper to prepare for supporting multiple different bounce buffer pools. > > Somewhere between the 1st and 2nd patch, specifying a specific swiotlb > for an SEV guest is no longer honored. For example, if I start an SEV > guest with 16GB of memory and specify swiotlb=131072 I used to get a > 256MB SWIOTLB. However, after the 2nd patch, the swiotlb=131072 is no > longer honored and I get a 982MB SWIOTLB (as set via sev_setup_arch() in > arch/x86/mm/mem_encrypt.c). > > I can't be sure which patch caused the issue since an SEV guest fails to > boot with the 1st patch but can boot with the 2nd patch, at which point > the SWIOTLB comes in at 982MB (I haven't had a chance to debug it and so > I'm hoping you might be able to quickly spot what's going on). Ok, I figured out the 1st patch boot issue (which is gone when the second patch is applied). Here's the issue if anyone is interested: diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index d9c097f0f78c..dbe369674afe 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -226,7 +226,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) alloc_size = PAGE_ALIGN(mem->nslabs * sizeof(size_t)); mem->alloc_size = memblock_alloc(alloc_size, PAGE_SIZE); - if (mem->alloc_size) + if (!mem->alloc_size) panic("%s: Failed to allocate %zu bytes align=0x%lx\n", __func__, alloc_size, PAGE_SIZE); The 1st patch still allowed the command line specified size of 256MB SWIOTLB. So that means the 2nd patch causes the command line specified 256MB SWIOTLB size to be ignored and results in a 982MB SWIOTLB size for the 16GB guest. Thanks, Tom > > Thanks, > Tom > >> >> Changes since v2: >> - fix a bisetion hazard that did not allocate the alloc_size array >> - dropped all patches already merged >> >> Changes since v1: >> - rebased to v5.12-rc1 >> - a few more cleanups >> - merge and forward port the patch from Claire to move all the global >>variables into a struct to prepare for multiple instances >
RE: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
From: Matthew Wilcox > Sent: 17 April 2021 03:45 > > Replacement patch to fix compiler warning. ... > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - return page->dma_addr; > + dma_addr_t ret = page->dma_addr[0]; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16; Ugly as well. Why not just replace the (dma_addr_t) cast with a (u64) one? Looks better than the double shift. Same could be done for the '>> 32'. Is there an upper_32_bits() that could be used?? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
RE: [PATCH 2/2] mm: Indicate pfmemalloc pages in compound_head
From: Matthew Wilcox (Oracle) > Sent: 17 April 2021 00:07 > > The net page_pool wants to use a magic value to identify page pool pages. > The best place to put it is in the first word where it can be clearly a > non-pointer value. That means shifting dma_addr up to alias with ->index, > which means we need to find another way to indicate page_is_pfmemalloc(). > Since page_pool doesn't want to set its magic value on pages which are > pfmemalloc, we can use bit 1 of compound_head to indicate that the page > came from the memory reserves. > ... > struct {/* page_pool used by netstack */ > - /** > - * @dma_addr: might require a 64-bit value on > - * 32-bit architectures. > - */ > + unsigned long pp_magic; > + unsigned long xmi; > + unsigned long _pp_mapping_pad; > unsigned long dma_addr[2]; > }; You've deleted the comment. I also think there should be a comment that dma_addr[0] must be aliased to ->index. (Or whatever all the exact requirements are.) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[V3 PATCH 16/16] crypto/nx: Add sysfs interface to export NX capabilities
Changes to export the following NXGZIP capabilities through sysfs: /sys/devices/vio/ibm,compression-v1/NxGzCaps: min_compress_len /*Recommended minimum compress length in bytes*/ min_decompress_len /*Recommended minimum decompress length in bytes*/ req_max_processed_len /* Maximum number of bytes processed in one request */ Signed-off-by: Haren Myneni --- drivers/crypto/nx/nx-common-pseries.c | 43 +++ 1 file changed, 43 insertions(+) diff --git a/drivers/crypto/nx/nx-common-pseries.c b/drivers/crypto/nx/nx-common-pseries.c index 49224870d05e..cc258d2c6475 100644 --- a/drivers/crypto/nx/nx-common-pseries.c +++ b/drivers/crypto/nx/nx-common-pseries.c @@ -962,6 +962,36 @@ static struct attribute_group nx842_attribute_group = { .attrs = nx842_sysfs_entries, }; +#definenxct_capab_read(_name) \ +static ssize_t nxct_##_name##_show(struct device *dev, \ + struct device_attribute *attr, char *buf) \ +{ \ + return sprintf(buf, "%lld\n", nx_ct_capab._name); \ +} + +#define NXCT_ATTR_RO(_name)\ + nxct_capab_read(_name); \ + static struct device_attribute dev_attr_##_name = __ATTR(_name, \ + 0444, \ + nxct_##_name##_show,\ + NULL); + +NXCT_ATTR_RO(req_max_processed_len); +NXCT_ATTR_RO(min_compress_len); +NXCT_ATTR_RO(min_decompress_len); + +static struct attribute *nxct_capab_sysfs_entries[] = { + &dev_attr_req_max_processed_len.attr, + &dev_attr_min_compress_len.attr, + &dev_attr_min_decompress_len.attr, + NULL, +}; + +static struct attribute_group nxct_capab_attr_group = { + .name = nx_ct_capab.name, + .attrs = nxct_capab_sysfs_entries, +}; + static struct nx842_driver nx842_pseries_driver = { .name = KBUILD_MODNAME, .owner =THIS_MODULE, @@ -1051,6 +1081,16 @@ static int nx842_probe(struct vio_dev *viodev, goto error; } + if (capab_feat) { + if (sysfs_create_group(&viodev->dev.kobj, + &nxct_capab_attr_group)) { + dev_err(&viodev->dev, + "Could not create sysfs NX capability entries\n"); + ret = -1; + goto error; + } + } + return 0; error_unlock: @@ -1070,6 +1110,9 @@ static void nx842_remove(struct vio_dev *viodev) pr_info("Removing IBM Power 842 compression device\n"); sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group); + if (capab_feat) + sysfs_remove_group(&viodev->dev.kobj, &nxct_capab_attr_group); + crypto_unregister_alg(&nx842_pseries_alg); spin_lock_irqsave(&devdata_mutex, flags); -- 2.18.2
[V3 PATCH 15/16] crypto/nx: Get NX capabilities for GZIP coprocessor type
phyp provides NX capabilities which gives recommended minimum compression / decompression length and maximum request buffer size in bytes. Changes to get NX overall capabilities which points to the specific features phyp supports. Then retrieve NXGZIP specific capabilities. Signed-off-by: Haren Myneni --- drivers/crypto/nx/nx-common-pseries.c | 83 +++ 1 file changed, 83 insertions(+) diff --git a/drivers/crypto/nx/nx-common-pseries.c b/drivers/crypto/nx/nx-common-pseries.c index 9a40fca8a9e6..49224870d05e 100644 --- a/drivers/crypto/nx/nx-common-pseries.c +++ b/drivers/crypto/nx/nx-common-pseries.c @@ -9,6 +9,7 @@ */ #include +#include #include #include "nx-842.h" @@ -20,6 +21,24 @@ MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors"); MODULE_ALIAS_CRYPTO("842"); MODULE_ALIAS_CRYPTO("842-nx"); +struct nx_ct_capabs_be { + __be64 descriptor; + __be64 req_max_processed_len; /* Max bytes in one GZIP request */ + __be64 min_compress_len; /* Min compression size in bytes */ + __be64 min_decompress_len; /* Min decompression size in bytes */ +} __packed __aligned(0x1000); + +struct nx_ct_capabs { + charname[VAS_DESCR_LEN + 1]; + u64 descriptor; + u64 req_max_processed_len; /* Max bytes in one GZIP request */ + u64 min_compress_len; /* Min compression in bytes */ + u64 min_decompress_len; /* Min decompression in bytes */ +}; + +u64 capab_feat = 0; +struct nx_ct_capabs nx_ct_capab; + static struct nx842_constraints nx842_pseries_constraints = { .alignment =DDE_BUFFER_ALIGN, .multiple = DDE_BUFFER_LAST_MULT, @@ -1066,6 +1085,66 @@ static void nx842_remove(struct vio_dev *viodev) kfree(old_devdata); } +/* + * Get NX capabilities from pHyp. + * Only NXGZIP capabilities are available right now and these values + * are available through sysfs. + */ +static void __init nxct_get_capabilities(void) +{ + struct vas_all_capabs_be *capabs_be; + struct nx_ct_capabs_be *nxc_be; + int rc; + + capabs_be = kmalloc(sizeof(*capabs_be), GFP_KERNEL); + if (!capabs_be) + return; + /* +* Get NX overall capabilities with feature type=0 +*/ + rc = plpar_vas_query_capabilities(H_QUERY_NX_CAPABILITIES, 0, + (u64)virt_to_phys(capabs_be)); + if (rc) + goto out; + + capab_feat = be64_to_cpu(capabs_be->feat_type); + /* +* NX-GZIP feature available +*/ + if (capab_feat & VAS_NX_GZIP_FEAT_BIT) { + nxc_be = kmalloc(sizeof(*nxc_be), GFP_KERNEL); + if (!nxc_be) + goto out; + /* +* Get capabilities for NX-GZIP feature +*/ + rc = plpar_vas_query_capabilities(H_QUERY_NX_CAPABILITIES, + VAS_NX_GZIP_FEAT, + (u64)virt_to_phys(nxc_be)); + } else { + pr_err("NX-GZIP feature is not available\n"); + rc = -EINVAL; + } + + if (!rc) { + snprintf(nx_ct_capab.name, VAS_DESCR_LEN + 1, "%.8s", +(char *)&nxc_be->descriptor); + nx_ct_capab.descriptor = be64_to_cpu(nxc_be->descriptor); + nx_ct_capab.req_max_processed_len = + be64_to_cpu(nxc_be->req_max_processed_len); + nx_ct_capab.min_compress_len = + be64_to_cpu(nxc_be->min_compress_len); + nx_ct_capab.min_decompress_len = + be64_to_cpu(nxc_be->min_decompress_len); + } else { + capab_feat = 0; + } + + kfree(nxc_be); +out: + kfree(capabs_be); +} + static const struct vio_device_id nx842_vio_driver_ids[] = { {"ibm,compression-v1", "ibm,compression"}, {"", ""}, @@ -1093,6 +1172,10 @@ static int __init nx842_pseries_init(void) return -ENOMEM; RCU_INIT_POINTER(devdata, new_devdata); + /* +* Get NX capabilities from pHyp which is used for NX-GZIP. +*/ + nxct_get_capabilities(); ret = vio_register_driver(&nx842_vio_driver); if (ret) { -- 2.18.2
[V3 PATCH 14/16] crypto/nx: Register and unregister VAS interface
Changes to create /dev/crypto/nx-gzip interface with VAS register and to remove this interface with VAS unregister. Signed-off-by: Haren Myneni --- drivers/crypto/nx/Kconfig | 1 + drivers/crypto/nx/nx-common-pseries.c | 9 + 2 files changed, 10 insertions(+) diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig index 23e3d0160e67..2a35e0e785bd 100644 --- a/drivers/crypto/nx/Kconfig +++ b/drivers/crypto/nx/Kconfig @@ -29,6 +29,7 @@ if CRYPTO_DEV_NX_COMPRESS config CRYPTO_DEV_NX_COMPRESS_PSERIES tristate "Compression acceleration support on pSeries platform" depends on PPC_PSERIES && IBMVIO + depends on PPC_VAS default y help Support for PowerPC Nest (NX) compression acceleration. This diff --git a/drivers/crypto/nx/nx-common-pseries.c b/drivers/crypto/nx/nx-common-pseries.c index cc8dd3072b8b..9a40fca8a9e6 100644 --- a/drivers/crypto/nx/nx-common-pseries.c +++ b/drivers/crypto/nx/nx-common-pseries.c @@ -9,6 +9,7 @@ */ #include +#include #include "nx-842.h" #include "nx_csbcpb.h" /* struct nx_csbcpb */ @@ -1101,6 +1102,12 @@ static int __init nx842_pseries_init(void) return ret; } + ret = vas_register_api_pseries(THIS_MODULE, VAS_COP_TYPE_GZIP, + "nx-gzip"); + + if (ret) + pr_err("NX-GZIP is not supported. Returned=%d\n", ret); + return 0; } @@ -,6 +1118,8 @@ static void __exit nx842_pseries_exit(void) struct nx842_devdata *old_devdata; unsigned long flags; + vas_unregister_api_pseries(); + crypto_unregister_alg(&nx842_pseries_alg); spin_lock_irqsave(&devdata_mutex, flags); -- 2.18.2
[V3 PATCH 13/16] crypto/nx: Rename nx-842-pseries file name to nx-common-pseries
Rename nx-842-pseries.c to nx-common-pseries.c to add code for new GZIP compression type. The actual functionality is not changed in this patch. Signed-off-by: Haren Myneni --- drivers/crypto/nx/Makefile | 2 +- drivers/crypto/nx/{nx-842-pseries.c => nx-common-pseries.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename drivers/crypto/nx/{nx-842-pseries.c => nx-common-pseries.c} (100%) diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile index bc89a20e5d9d..d00181a26dd6 100644 --- a/drivers/crypto/nx/Makefile +++ b/drivers/crypto/nx/Makefile @@ -14,5 +14,5 @@ nx-crypto-objs := nx.o \ obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_PSERIES) += nx-compress-pseries.o nx-compress.o obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV) += nx-compress-powernv.o nx-compress.o nx-compress-objs := nx-842.o -nx-compress-pseries-objs := nx-842-pseries.o +nx-compress-pseries-objs := nx-common-pseries.o nx-compress-powernv-objs := nx-common-powernv.o diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-common-pseries.c similarity index 100% rename from drivers/crypto/nx/nx-842-pseries.c rename to drivers/crypto/nx/nx-common-pseries.c -- 2.18.2
[V3 PATCH 12/16] powerpc/pseries/vas: sysfs interface to export capabilities
pHyp provides GZIP default and GZIP QoS capabilities which gives the total number of credits are available in LPAR. This patch creates sysfs entries and exports LPAR credits, the currently used and the available credits for each feature. /sys/kernel/vas/VasCaps/VDefGzip: (default GZIP capabilities) avail_lpar_creds /* Available credits to use */ target_lpar_creds /* Total credits available which can be /* changed with DLPAR operation */ used_lpar_creds /* Used credits */ /sys/kernel/vas/VasCaps/VQosGzip (QoS GZIP capabilities) avail_lpar_creds target_lpar_creds used_lpar_creds Signed-off-by: Haren Myneni --- arch/powerpc/platforms/pseries/Makefile| 2 +- arch/powerpc/platforms/pseries/vas-sysfs.c | 173 + arch/powerpc/platforms/pseries/vas.c | 6 + arch/powerpc/platforms/pseries/vas.h | 2 + 4 files changed, 182 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/platforms/pseries/vas-sysfs.c diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index 4cda0ef87be0..e24093bebc0b 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -30,4 +30,4 @@ obj-$(CONFIG_PPC_SVM) += svm.o obj-$(CONFIG_FA_DUMP) += rtas-fadump.o obj-$(CONFIG_SUSPEND) += suspend.o -obj-$(CONFIG_PPC_VAS) += vas.o +obj-$(CONFIG_PPC_VAS) += vas.o vas-sysfs.o diff --git a/arch/powerpc/platforms/pseries/vas-sysfs.c b/arch/powerpc/platforms/pseries/vas-sysfs.c new file mode 100644 index ..5f01f8ba6806 --- /dev/null +++ b/arch/powerpc/platforms/pseries/vas-sysfs.c @@ -0,0 +1,173 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright 2016-17 IBM Corp. + */ + +#define pr_fmt(fmt) "vas: " fmt + +#include +#include +#include +#include +#include + +#include "vas.h" + +#ifdef CONFIG_SYSFS +static struct kobject *pseries_vas_kobj; +static struct kobject *vas_capabs_kobj; + +struct vas_capabs_entry { + struct kobject kobj; + struct vas_ct_capabs *capabs; +}; + +#define to_capabs_entry(entry) container_of(entry, struct vas_capabs_entry, kobj) + +static ssize_t avail_lpar_creds_show(struct vas_ct_capabs *capabs, char *buf) +{ + int avail_creds = atomic_read(&capabs->target_lpar_creds) - + atomic_read(&capabs->used_lpar_creds); + return sprintf(buf, "%d\n", avail_creds); +} + +#define sysfs_capbs_entry_read(_name) \ +static ssize_t _name##_show(struct vas_ct_capabs *capabs, char *buf) \ +{ \ + return sprintf(buf, "%d\n", atomic_read(&capabs->_name)); \ +} + +struct vas_sysfs_entry { + struct attribute attr; + ssize_t (*show)(struct vas_ct_capabs *, char *); + ssize_t (*store)(struct vas_ct_capabs *, const char *, size_t); +}; + +#define VAS_ATTR_RO(_name) \ + sysfs_capbs_entry_read(_name); \ + static struct vas_sysfs_entry _name##_attribute = __ATTR(_name, \ + 0444, _name##_show, NULL); + +VAS_ATTR_RO(target_lpar_creds); +VAS_ATTR_RO(used_lpar_creds); + +static struct vas_sysfs_entry avail_lpar_creds_attribute = + __ATTR(avail_lpar_creds, 0444, avail_lpar_creds_show, NULL); + +static struct attribute *vas_capab_attrs[] = { + &target_lpar_creds_attribute.attr, + &used_lpar_creds_attribute.attr, + &avail_lpar_creds_attribute.attr, + NULL, +}; + +static ssize_t vas_type_show(struct kobject *kobj, struct attribute *attr, +char *buf) +{ + struct vas_capabs_entry *centry; + struct vas_ct_capabs *capabs; + struct vas_sysfs_entry *entry; + + centry = to_capabs_entry(kobj); + capabs = centry->capabs; + entry = container_of(attr, struct vas_sysfs_entry, attr); + + if (!entry->show) + return -EIO; + + return entry->show(capabs, buf); +} + +static ssize_t vas_type_store(struct kobject *kobj, struct attribute *attr, + const char *buf, size_t count) +{ + struct vas_capabs_entry *centry; + struct vas_ct_capabs *capabs; + struct vas_sysfs_entry *entry; + + centry = to_capabs_entry(kobj); + capabs = centry->capabs; + entry = container_of(attr, struct vas_sysfs_entry, attr); + if (!entry->store) + return -EIO; + + return entry->store(capabs, buf, count); +} + +static void vas_type_release(struct kobject *kobj) +{ + struct vas_capabs_entry *centry = to_capabs_entry(kobj); + kfree(centry); +} + +static const struct sysfs_ops vas_sysfs_ops = { + .show = vas_type_show, + .store = vas_type_store, +}; + +static struct kobj_type vas_attr_type = { + .release= vas_type_release, +
[V3 PATCH 11/16] powerpc/pseries/vas: Setup IRQ and fault handling
When NX sees a fault on the user space buffer, generates a fault interrupt and pHyp forwards that interrupt to OS. Then the kernel makes H_GET_NX_FAULT HCALL to retrieve the fault CRB information. This patch adds changes to setup IRQ per each window and handles fault by updating CSB. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/pseries/vas.c | 111 ++- 1 file changed, 110 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c index 0ade0d6d728f..2106eca0862a 100644 --- a/arch/powerpc/platforms/pseries/vas.c +++ b/arch/powerpc/platforms/pseries/vas.c @@ -224,6 +224,62 @@ int plpar_vas_query_capabilities(const u64 hcall, u8 query_type, } EXPORT_SYMBOL_GPL(plpar_vas_query_capabilities); +/* + * HCALL to get fault CRB from pHyp. + */ +static int plpar_get_nx_fault(u32 winid, u64 buffer) +{ + int64_t rc; + + rc = plpar_hcall_norets(H_GET_NX_FAULT, winid, buffer); + + switch (rc) { + case H_SUCCESS: + return 0; + case H_PARAMETER: + pr_err("HCALL(%x): Invalid window ID %u\n", H_GET_NX_FAULT, + winid); + return -EINVAL; + case H_STATE: + pr_err("HCALL(%x): No outstanding faults for window ID %u\n", + H_GET_NX_FAULT, winid); + return -EINVAL; + case H_PRIVILEGE: + pr_err("HCALL(%x): Window(%u): Invalid fault buffer 0x%llx\n", + H_GET_NX_FAULT, winid, buffer); + return -EACCES; + default: + pr_err("HCALL(%x): Unexpected error %lld for window(%u)\n", + H_GET_NX_FAULT, rc, winid); + return -EIO; + } +} + +/* + * Handle the fault interrupt. + * When the fault interrupt is received for each window, query pHyp to get + * the fault CRB on the specific fault. Then process the CRB by updating + * CSB or send signal if the user space CSB is invalid. + * Note: pHyp forwards an interrupt for each fault request. So one fault + * CRB to process for each H_GET_NX_FAULT HCALL. + */ +irqreturn_t pseries_vas_fault_thread_fn(int irq, void *data) +{ + struct vas_window *txwin = data; + struct coprocessor_request_block crb; + struct vas_win_task *tsk; + int rc; + + rc = plpar_get_nx_fault(txwin->winid, (u64)virt_to_phys(&crb)); + if (!rc) { + tsk = &txwin->task; + vas_dump_crb(&crb); + vas_update_csb(&crb, tsk); + } + + return IRQ_HANDLED; +} + /* * Allocate window and setup IRQ mapping. */ @@ -235,10 +291,51 @@ static int allocate_setup_window(struct vas_window *txwin, rc = plpar_vas_allocate_window(txwin, domain, wintype, DEF_WIN_CREDS); if (rc) return rc; + /* +* On powerVM, pHyp setup and forwards the fault interrupt per +* window. So the IRQ setup and fault handling will be done for +* each open window separately. +*/ + txwin->lpar.fault_virq = irq_create_mapping(NULL, + txwin->lpar.fault_irq); + if (!txwin->lpar.fault_virq) { + pr_err("Failed irq mapping %d\n", txwin->lpar.fault_irq); + rc = -EINVAL; + goto out_win; + } + + txwin->lpar.name = kasprintf(GFP_KERNEL, "vas-win-%d", txwin->winid); + if (!txwin->lpar.name) { + rc = -ENOMEM; + goto out_irq; + } + + rc = request_threaded_irq(txwin->lpar.fault_virq, NULL, + pseries_vas_fault_thread_fn, IRQF_ONESHOT, + txwin->lpar.name, txwin); + if (rc) { + pr_err("VAS-Window[%d]: Request IRQ(%u) failed with %d\n", + txwin->winid, txwin->lpar.fault_virq, rc); + goto out_free; + } txwin->wcreds_max = DEF_WIN_CREDS; return 0; +out_free: + kfree(txwin->lpar.name); +out_irq: + irq_dispose_mapping(txwin->lpar.fault_virq); +out_win: + plpar_vas_deallocate_window(txwin->winid); + return rc; +} + +static inline void free_irq_setup(struct vas_window *txwin) +{ + free_irq(txwin->lpar.fault_virq, txwin); + irq_dispose_mapping(txwin->lpar.fault_virq); + kfree(txwin->lpar.name); } static struct vas_window *vas_allocate_window(struct vas_tx_win_open_attr *uattr, @@ -346,6 +443,11 @@ static struct vas_window *vas_allocate_window(struct vas_tx_win_open_attr *uattr return txwin; out_free: + /* +* Window is not operational. Free IRQ before closing +* window so that do not have to hold mutex. +*/ + free_irq_setup(txwin); plpar_vas_deallocate_window(txwin->winid); out: atomic_dec(&ct_capab->used_lpar_creds); @@ -364,9 +466,16 @@ static int deallocate_free_window(s
[V3 PATCH 10/16] powerpc/pseries/vas: Integrate API with open/close windows
This patch adds VAS window allocatioa/close with the corresponding HCALLs. Also changes to integrate with the existing user space VAS API and provide register/unregister functions to NX pseries driver. The driver register function is used to create the user space interface (/dev/crypto/nx-gzip) and unregister to remove this entry. The user space process opens this device node and makes an ioctl to allocate VAS window. The close interface is used to deallocate window. Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/vas.h | 5 + arch/powerpc/platforms/book3s/Kconfig | 2 +- arch/powerpc/platforms/pseries/Makefile | 1 + arch/powerpc/platforms/pseries/vas.c| 212 4 files changed, 219 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h index d15784506a54..aa1974aba27e 100644 --- a/arch/powerpc/include/asm/vas.h +++ b/arch/powerpc/include/asm/vas.h @@ -270,6 +270,11 @@ struct vas_all_capabs { u64 feat_type; }; +int plpar_vas_query_capabilities(const u64 hcall, u8 query_type, +u64 result); +int vas_register_api_pseries(struct module *mod, +enum vas_cop_type cop_type, const char *name); +void vas_unregister_api_pseries(void); #endif /* diff --git a/arch/powerpc/platforms/book3s/Kconfig b/arch/powerpc/platforms/book3s/Kconfig index 51e14db83a79..bed21449e8e5 100644 --- a/arch/powerpc/platforms/book3s/Kconfig +++ b/arch/powerpc/platforms/book3s/Kconfig @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 config PPC_VAS bool "IBM Virtual Accelerator Switchboard (VAS)" - depends on PPC_POWERNV && PPC_64K_PAGES + depends on (PPC_POWERNV || PPC_PSERIES) && PPC_64K_PAGES default y help This enables support for IBM Virtual Accelerator Switchboard (VAS). diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index c8a2b0b05ac0..4cda0ef87be0 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -30,3 +30,4 @@ obj-$(CONFIG_PPC_SVM) += svm.o obj-$(CONFIG_FA_DUMP) += rtas-fadump.o obj-$(CONFIG_SUSPEND) += suspend.o +obj-$(CONFIG_PPC_VAS) += vas.o diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c index 35946fb02995..0ade0d6d728f 100644 --- a/arch/powerpc/platforms/pseries/vas.c +++ b/arch/powerpc/platforms/pseries/vas.c @@ -222,6 +222,218 @@ int plpar_vas_query_capabilities(const u64 hcall, u8 query_type, return -EIO; } } +EXPORT_SYMBOL_GPL(plpar_vas_query_capabilities); + +/* + * Allocate window and setup IRQ mapping. + */ +static int allocate_setup_window(struct vas_window *txwin, +u64 *domain, u8 wintype) +{ + int rc; + + rc = plpar_vas_allocate_window(txwin, domain, wintype, DEF_WIN_CREDS); + if (rc) + return rc; + + txwin->wcreds_max = DEF_WIN_CREDS; + + return 0; +} + +static struct vas_window *vas_allocate_window(struct vas_tx_win_open_attr *uattr, + enum vas_cop_type cop_type) +{ + long domain[PLPAR_HCALL9_BUFSIZE] = {VAS_DEFAULT_DOMAIN_ID}; + struct vas_ct_capabs *ct_capab; + struct vas_capabs *capabs; + struct vas_window *txwin; + int rc; + + txwin = kzalloc(sizeof(*txwin), GFP_KERNEL); + if (!txwin) + return ERR_PTR(-ENOMEM); + + /* +* A VAS window can have many credits which means that many +* requests can be issued simultaneously. But phyp restricts +* one credit per window. +* phyp introduces 2 different types of credits: +* Default credit type (Uses normal priority FIFO): +* A limited number of credits are assigned to partitions +* based on processor entitlement. But these credits may be +* over-committed on a system depends on whether the CPUs +* are in shared or dedicated modes - that is, more requests +* may be issued across the system than NX can service at +* once which can result in paste command failure (RMA_busy). +* Then the process has to resend requests or fall-back to +* SW compression. +* Quality of Service (QoS) credit type (Uses high priority FIFO): +* To avoid NX HW contention, the system admins can assign +* QoS credits for each LPAR so that this partition is +* guaranteed access to NX resources. These credits are +* assigned to partitions via the HMC. +* Refer PAPR for more information. +* +* Allocate window with QoS credits if user requested. Otherwise +* default credits are used. +*/ + if (uattr->flags & VAS_WIN_QOS_CREDITS) +
[V3 PATCH 09/16] powerpc/pseries/vas: Implement to get all capabilities
pHyp provides various VAS capabilities such as GZIP default and QoS capabilities which are used to determine total number of credits available in LPAR, maximum window credits, maximum LPAR credits, whether usermode copy/paste is supported, and etc. So first retrieve overall vas capabilities using H_QUERY_VAS_CAPABILITIES HCALL which tells the specific features that are available. Then retrieve the specific capabilities by using the feature type in H_QUERY_VAS_CAPABILITIES HCALL. pHyp supports only GZIP default and GZIP QoS capabilities right now. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/pseries/vas.c | 130 +++ 1 file changed, 130 insertions(+) diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c index 06960151477c..35946fb02995 100644 --- a/arch/powerpc/platforms/pseries/vas.c +++ b/arch/powerpc/platforms/pseries/vas.c @@ -30,6 +30,13 @@ /* phyp allows one credit per window right now */ #define DEF_WIN_CREDS 1 +static struct vas_all_capabs capabs_all; +static int copypaste_feat; + +struct vas_capabs vcapabs[VAS_MAX_FEAT_TYPE]; + +DEFINE_MUTEX(vas_pseries_mutex); + static int64_t hcall_return_busy_check(int64_t rc) { /* Check if we are stalled for some time */ @@ -215,3 +222,126 @@ int plpar_vas_query_capabilities(const u64 hcall, u8 query_type, return -EIO; } } + +/* + * Get the specific capabilities based on the feature type. + * Right now supports GZIP default and GZIP QoS capabilities. + */ +static int get_vas_capabilities(u8 feat, enum vas_cop_feat_type type, + struct vas_ct_capabs_be *capab_be) +{ + struct vas_ct_capabs *capab; + struct vas_capabs *vcapab; + int rc = 0; + + vcapab = &vcapabs[type]; + memset(vcapab, 0, sizeof(*vcapab)); + INIT_LIST_HEAD(&vcapab->list); + + capab = &vcapab->capab; + + rc = plpar_vas_query_capabilities(H_QUERY_VAS_CAPABILITIES, feat, + (u64)virt_to_phys(capab_be)); + if (rc) + return rc; + + capab->user_mode = capab_be->user_mode; + if (!(capab->user_mode & VAS_COPY_PASTE_USER_MODE)) { + pr_err("User space COPY/PASTE is not supported\n"); + return -ENOTSUPP; + } + + snprintf(capab->name, VAS_DESCR_LEN + 1, "%.8s", +(char *)&capab_be->descriptor); + capab->descriptor = be64_to_cpu(capab_be->descriptor); + capab->win_type = capab_be->win_type; + if (capab->win_type >= VAS_MAX_FEAT_TYPE) { + pr_err("Unsupported window type %u\n", capab->win_type); + return -EINVAL; + } + capab->max_lpar_creds = be16_to_cpu(capab_be->max_lpar_creds); + capab->max_win_creds = be16_to_cpu(capab_be->max_win_creds); + atomic_set(&capab->target_lpar_creds, + be16_to_cpu(capab_be->target_lpar_creds)); + if (feat == VAS_GZIP_DEF_FEAT) { + capab->def_lpar_creds = be16_to_cpu(capab_be->def_lpar_creds); + + if (capab->max_win_creds < DEF_WIN_CREDS) { + pr_err("Window creds(%u) > max allowed window creds(%u)\n", + DEF_WIN_CREDS, capab->max_win_creds); + return -EINVAL; + } + } + + copypaste_feat = 1; + + return 0; +} + +static int __init pseries_vas_init(void) +{ + struct vas_ct_capabs_be *ct_capabs_be; + struct vas_all_capabs_be *capabs_be; + int rc; + + /* +* Linux supports user space COPY/PASTE only with Radix +*/ + if (!radix_enabled()) { + pr_err("API is supported only with radix page tables\n"); + return -ENOTSUPP; + } + + capabs_be = kmalloc(sizeof(*capabs_be), GFP_KERNEL); + if (!capabs_be) + return -ENOMEM; + /* +* Get VAS overall capabilities by passing 0 to feature type. +*/ + rc = plpar_vas_query_capabilities(H_QUERY_VAS_CAPABILITIES, 0, + (u64)virt_to_phys(capabs_be)); + if (rc) + goto out; + + snprintf(capabs_all.name, VAS_DESCR_LEN, "%.7s", +(char *)&capabs_be->descriptor); + capabs_all.descriptor = be64_to_cpu(capabs_be->descriptor); + capabs_all.feat_type = be64_to_cpu(capabs_be->feat_type); + + ct_capabs_be = kmalloc(sizeof(*ct_capabs_be), GFP_KERNEL); + if (!ct_capabs_be) { + rc = -ENOMEM; + goto out; + } + /* +* QOS capabilities available +*/ + if (capabs_all.feat_type & VAS_GZIP_QOS_FEAT_BIT) { + rc = get_vas_capabilities(VAS_GZIP_QOS_FEAT, + VAS_GZIP_QOS_FEAT_TYPE, ct_capabs_be); + + if (rc) + goto out_ct; + } + /* +* Def
[V3 PATCH 08/16] powerpc/pseries/VAS: Implement allocate/modify/deallocate HCALLS
This patch adds the following HCALLs which are used to allocate, modify and deallocate VAS windows. H_ALLOCATE_VAS_WINDOW: Allocate VAS window H_DEALLOCATE_VAS_WINDOW: Close VAS window H_MODIFY_VAS_WINDOW: Setup window before using Also adds phyp call (H_QUERY_VAS_CAPABILITIES) to get all VAS capabilities that phyp provides. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/pseries/vas.c | 217 +++ 1 file changed, 217 insertions(+) create mode 100644 arch/powerpc/platforms/pseries/vas.c diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c new file mode 100644 index ..06960151477c --- /dev/null +++ b/arch/powerpc/platforms/pseries/vas.c @@ -0,0 +1,217 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright 2020-21 IBM Corp. + */ + +#define pr_fmt(fmt) "vas: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "vas.h" + +#defineVAS_INVALID_WIN_ADDRESS 0xul +#defineVAS_DEFAULT_DOMAIN_ID 0xul +/* Authority Mask Register (AMR) value is not supported in */ +/* linux implementation. So pass '0' to modify window HCALL */ +#defineVAS_AMR_VALUE 0 +/* phyp allows one credit per window right now */ +#define DEF_WIN_CREDS 1 + +static int64_t hcall_return_busy_check(int64_t rc) +{ + /* Check if we are stalled for some time */ + if (H_IS_LONG_BUSY(rc)) { + msleep(get_longbusy_msecs(rc)); + rc = H_BUSY; + } else if (rc == H_BUSY) { + cond_resched(); + } + + return rc; +} + +/* + * Allocate VAS window HCALL + */ +static int plpar_vas_allocate_window(struct vas_window *win, u64 *domain, +u8 wintype, u16 credits) +{ + long retbuf[PLPAR_HCALL9_BUFSIZE] = {0}; + int64_t rc; + + do { + rc = plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, wintype, + credits, domain[0], domain[1], domain[2], + domain[3], domain[4], domain[5]); + + rc = hcall_return_busy_check(rc); + } while (rc == H_BUSY); + + switch (rc) { + case H_SUCCESS: + win->winid = retbuf[0]; + win->lpar.win_addr = retbuf[1]; + win->lpar.complete_irq = retbuf[2]; + win->lpar.fault_irq = retbuf[3]; + if (win->lpar.win_addr == VAS_INVALID_WIN_ADDRESS) { + pr_err("HCALL(%x): COPY/PASTE is not supported\n", + H_ALLOCATE_VAS_WINDOW); + return -ENOTSUPP; + } + return 0; + case H_PARAMETER: + pr_err("HCALL(%x): Invalid window type (%u)\n", + H_ALLOCATE_VAS_WINDOW, wintype); + return -EINVAL; + case H_P2: + pr_err("HCALL(%x): Credits(%u) exceed maximum window credits\n", + H_ALLOCATE_VAS_WINDOW, credits); + return -EINVAL; + case H_COP_HW: + pr_err("HCALL(%x): User-mode COPY/PASTE is not supported\n", + H_ALLOCATE_VAS_WINDOW); + return -ENOTSUPP; + case H_RESOURCE: + pr_err("HCALL(%x): LPAR credit limit exceeds window limit\n", + H_ALLOCATE_VAS_WINDOW); + return -EPERM; + case H_CONSTRAINED: + pr_err("HCALL(%x): Credits (%u) are not available\n", + H_ALLOCATE_VAS_WINDOW, credits); + return -EPERM; + default: + pr_err("HCALL(%x): Unexpected error %lld\n", + H_ALLOCATE_VAS_WINDOW, rc); + return -EIO; + } +} + +/* + * Deallocate VAS window HCALL. + */ +static int plpar_vas_deallocate_window(u64 winid) +{ + int64_t rc; + + do { + rc = plpar_hcall_norets(H_DEALLOCATE_VAS_WINDOW, winid); + + rc = hcall_return_busy_check(rc); + } while (rc == H_BUSY); + + switch (rc) { + case H_SUCCESS: + return 0; + case H_PARAMETER: + pr_err("HCALL(%x): Invalid window ID %llu\n", + H_DEALLOCATE_VAS_WINDOW, winid); + return -EINVAL; + case H_STATE: + pr_err("HCALL(%x): Window(%llu): Invalid page table entries\n", + H_DEALLOCATE_VAS_WINDOW, winid); + return -EPERM; + default: + pr_err("HCALL(%x): Unexpected error %lld for window(%llu)\n", + H_DEALLOCATE_VAS_WINDOW, rc, winid); + return -EIO; + } +} + +/* + * Modify VAS window. + * After the window is opened with allocate window HCALL, configure it + * with flags and LPAR PID before using.
[V3 PATCH 07/16] powerpc/vas: Define QoS credit flag to allocate window
pHyp introduces two different type of credits: Default and Quality of service (QoS). The total number of default credits available on each LPAR depends on CPU resources configured. But these credits can be shared or over-committed across LPARs in shared mode which can result in paste command failure (RMA_busy). To avoid NX HW contention, phyp introduces QoS credit type which makes sure guaranteed access to NX resources. The system admins can assign QoS credits for each LPAR via HMC. Default credit type is used to allocate a VAS window by default as on powerVM implementation. But the process can pass VAS_WIN_QOS_CREDITS flag with VAS_TX_WIN_OPEN ioctl to open QoS type window. Signed-off-by: Haren Myneni --- arch/powerpc/include/uapi/asm/vas-api.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/uapi/asm/vas-api.h b/arch/powerpc/include/uapi/asm/vas-api.h index ebd4b2424785..eb7c8694174f 100644 --- a/arch/powerpc/include/uapi/asm/vas-api.h +++ b/arch/powerpc/include/uapi/asm/vas-api.h @@ -13,11 +13,15 @@ #define VAS_MAGIC 'v' #define VAS_TX_WIN_OPEN_IOW(VAS_MAGIC, 0x20, struct vas_tx_win_open_attr) +/* Flags to VAS TX open window ioctl */ +/* To allocate a window with QoS credit, otherwise default credit is used */ +#defineVAS_WIN_QOS_CREDITS 0x0001 + struct vas_tx_win_open_attr { __u32 version; __s16 vas_id; /* specific instance of vas or -1 for default */ __u16 reserved1; - __u64 flags; /* Future use */ + __u64 flags; __u64 reserved2[6]; }; -- 2.18.2
[V3 PATCH 06/16] powerpc/pseries/vas: Define VAS/NXGZIP HCALLs and structs
This patch adds HCALLs and other definitions. Also define structs that are used in VAS implementation on powerVM. Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/hvcall.h| 7 ++ arch/powerpc/include/asm/vas.h | 28 arch/powerpc/platforms/pseries/vas.h | 96 3 files changed, 131 insertions(+) create mode 100644 arch/powerpc/platforms/pseries/vas.h diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index ed6086d57b22..accbb7f6f272 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -294,6 +294,13 @@ #define H_RESIZE_HPT_COMMIT0x370 #define H_REGISTER_PROC_TBL0x37C #define H_SIGNAL_SYS_RESET 0x380 +#defineH_ALLOCATE_VAS_WINDOW 0x388 +#defineH_MODIFY_VAS_WINDOW 0x38C +#defineH_DEALLOCATE_VAS_WINDOW 0x390 +#defineH_QUERY_VAS_WINDOW 0x394 +#defineH_QUERY_VAS_CAPABILITIES0x398 +#defineH_QUERY_NX_CAPABILITIES 0x39C +#defineH_GET_NX_FAULT 0x3A0 #define H_INT_GET_SOURCE_INFO 0x3A8 #define H_INT_SET_SOURCE_CONFIG 0x3AC #define H_INT_GET_SOURCE_CONFIG 0x3B0 diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h index f928bf4c7e98..d15784506a54 100644 --- a/arch/powerpc/include/asm/vas.h +++ b/arch/powerpc/include/asm/vas.h @@ -179,6 +179,7 @@ struct vas_tx_win_attr { bool rx_win_ord_mode; }; +#ifdef CONFIG_PPC_POWERNV /* * Helper to map a chip id to VAS id. * For POWER9, this is a 1:1 mapping. In the future this maybe a 1:N @@ -243,6 +244,33 @@ int vas_paste_crb(struct vas_window *win, int offset, bool re); int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type, const char *name); void vas_unregister_api_powernv(void); +#endif + +#ifdef CONFIG_PPC_PSERIES + +/* VAS Capabilities */ +#define VAS_GZIP_QOS_FEAT 0x1 +#define VAS_GZIP_DEF_FEAT 0x2 +#define VAS_GZIP_QOS_FEAT_BIT (1UL << (63 - VAS_GZIP_QOS_FEAT)) /* Bit 1 */ +#define VAS_GZIP_DEF_FEAT_BIT (1UL << (63 - VAS_GZIP_DEF_FEAT)) /* Bit 2 */ + +/* NX Capabilities */ +#defineVAS_NX_GZIP_FEAT0x1 +#defineVAS_NX_GZIP_FEAT_BIT(1UL << (63 - VAS_NX_GZIP_FEAT)) /* Bit 1 */ +#defineVAS_DESCR_LEN 8 + +struct vas_all_capabs_be { + __be64 descriptor; + __be64 feat_type; +} __packed __aligned(0x1000); + +struct vas_all_capabs { + charname[VAS_DESCR_LEN + 1]; + u64 descriptor; + u64 feat_type; +}; + +#endif /* * Register / unregister coprocessor type to VAS API which will be exported diff --git a/arch/powerpc/platforms/pseries/vas.h b/arch/powerpc/platforms/pseries/vas.h new file mode 100644 index ..208682fffa57 --- /dev/null +++ b/arch/powerpc/platforms/pseries/vas.h @@ -0,0 +1,96 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright 2020-21 IBM Corp. + */ + +#ifndef _VAS_H +#define _VAS_H +#include +#include +#include + +/* + * VAS window modify flags + */ +#defineVAS_MOD_WIN_CLOSE (1UL << 63) +#defineVAS_MOD_WIN_JOBS_KILL (1UL << (63 - 1)) +#defineVAS_MOD_WIN_DR (1UL << (63 - 3)) +#defineVAS_MOD_WIN_PR (1UL << (63 - 4)) +#defineVAS_MOD_WIN_SF (1UL << (63 - 5)) +#defineVAS_MOD_WIN_TA (1UL << (63 - 6)) +#defineVAS_MOD_WIN_FLAGS (VAS_MOD_WIN_JOBS_KILL | VAS_MOD_WIN_DR | \ + VAS_MOD_WIN_PR | VAS_MOD_WIN_SF) + +#defineVAS_WIN_ACTIVE 0x0 +#defineVAS_WIN_CLOSED 0x1 +#defineVAS_WIN_INACTIVE0x2 /* Inactive due to HW failure */ +/* Process of being modified, deallocated, or quiesced */ +#defineVAS_WIN_MOD_IN_PROCESS 0x3 + +#defineVAS_COPY_PASTE_USER_MODE0x0001 +#defineVAS_COP_OP_USER_MODE0x0010 + +/* + * Co-processor feature - GZIP QoS windows or GZIP default windows + */ +enum vas_cop_feat_type { + VAS_GZIP_QOS_FEAT_TYPE, + VAS_GZIP_DEF_FEAT_TYPE, + VAS_MAX_FEAT_TYPE, +}; + +struct vas_ct_capabs_be { + __be64 descriptor; + u8 win_type; /* Default or QoS type */ + u8 user_mode; + __be16 max_lpar_creds; + __be16 max_win_creds; + union { + __be16 reserved; + __be16 def_lpar_creds; /* Used for default capabilities */ + }; + __be16 target_lpar_creds; +} __packed __aligned(0x1000); + +struct vas_ct_capabs { + charname[VAS_DESCR_LEN + 1]; + u64 descriptor; + u8 win_type; /* Default or QoS type */ + u8 user_mode; /* User mode copy/paste or COP HCALL */ + u16 max_lpar_creds; /* Max credits available in LPAR */ + /* Max credits can be assigned per win
[V3 PATCH 05/16] powerpc/vas: Define and use common vas_window struct
[V3 PATCH 03/16] powerpc/vas: Create take/drop task reference functions
Take task reference when each window opens and drops during close. This functionality is needed for powerNV and pseries. So this patch defines the existing code as functions in common book3s platform vas-api.c Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/vas.h | 20 arch/powerpc/platforms/book3s/vas-api.c | 51 ++ arch/powerpc/platforms/powernv/vas-fault.c | 10 ++-- arch/powerpc/platforms/powernv/vas-window.c | 57 ++--- arch/powerpc/platforms/powernv/vas.h| 6 +-- 5 files changed, 83 insertions(+), 61 deletions(-) diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h index 6bbade60d8f4..2daaa1a2a9a9 100644 --- a/arch/powerpc/include/asm/vas.h +++ b/arch/powerpc/include/asm/vas.h @@ -5,6 +5,9 @@ #ifndef _ASM_POWERPC_VAS_H #define _ASM_POWERPC_VAS_H +#include +#include +#include #include @@ -60,6 +63,22 @@ struct vas_user_win_ops { int (*close_win)(void *); }; +struct vas_win_task { + struct pid *pid;/* Thread group ID of owner */ + struct pid *tgid; /* Linux process mm_struct */ + struct mm_struct *mm; /* Linux process mm_struct */ +}; + +static inline void vas_drop_reference_task(struct vas_win_task *task) +{ + /* Drop references to pid and mm */ + put_pid(task->pid); + if (task->mm) { + mm_context_remove_vas_window(task->mm); + mmdrop(task->mm); + } +} + /* * Receive window attributes specified by the (in-kernel) owner of window. */ @@ -190,4 +209,5 @@ int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type, struct vas_user_win_ops *vops); void vas_unregister_coproc_api(void); +int vas_reference_task(struct vas_win_task *vtask); #endif /* __ASM_POWERPC_VAS_H */ diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c index 05d7b99acf41..d98caa734154 100644 --- a/arch/powerpc/platforms/book3s/vas-api.c +++ b/arch/powerpc/platforms/book3s/vas-api.c @@ -60,6 +60,57 @@ static char *coproc_devnode(struct device *dev, umode_t *mode) return kasprintf(GFP_KERNEL, "crypto/%s", dev_name(dev)); } +/* + * Take reference to pid and mm + */ +int vas_reference_task(struct vas_win_task *vtask) +{ + /* +* Window opened by a child thread may not be closed when +* it exits. So take reference to its pid and release it +* when the window is free by parent thread. +* Acquire a reference to the task's pid to make sure +* pid will not be re-used - needed only for multithread +* applications. +*/ + vtask->pid = get_task_pid(current, PIDTYPE_PID); + /* +* Acquire a reference to the task's mm. +*/ + vtask->mm = get_task_mm(current); + if (!vtask->mm) { + put_pid(vtask->pid); + pr_err("VAS: pid(%d): mm_struct is not found\n", + current->pid); + return -EPERM; + } + + mmgrab(vtask->mm); + mmput(vtask->mm); + mm_context_add_vas_window(vtask->mm); + /* +* Process closes window during exit. In the case of +* multithread application, the child thread can open +* window and can exit without closing it. Expects parent +* thread to use and close the window. So do not need +* to take pid reference for parent thread. +*/ + vtask->tgid = find_get_pid(task_tgid_vnr(current)); + /* +* Even a process that has no foreign real address mapping can +* use an unpaired COPY instruction (to no real effect). Issue +* CP_ABORT to clear any pending COPY and prevent a covert +* channel. +* +* __switch_to() will issue CP_ABORT on future context switches +* if process / thread has any open VAS window (Use +* current->mm->context.vas_windows). +*/ + asm volatile(PPC_CP_ABORT); + + return 0; +} + static int coproc_open(struct inode *inode, struct file *fp) { struct coproc_instance *cp_inst; diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index 3d21fce254b7..a4835cb82c09 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -73,7 +73,7 @@ static void update_csb(struct vas_window *window, * NX user space windows can not be opened for task->mm=NULL * and faults will not be generated for kernel requests. */ - if (WARN_ON_ONCE(!window->mm || !window->user_win)) + if (WARN_ON_ONCE(!window->task.mm || !window->user_win)) return; csb_addr = (void __user *)be64_to_cpu(crb->csb_addr); @@ -92,7 +92,7 @@ static void update_csb(struct vas_window *window, csb.address = crb->stamp.nx.fault_storage_addr; csb.flags = 0
[V3 PATCH 04/16] powerpc/vas: Move update_csb/dump_crb to common book3s platform
NX issues an interrupt when sees fault on user space buffer. The kernel processes the fault by updating CSB. This functionality is same for both powerNV and pseries. So this patch moves these functions to common vas-api.c and the actual functionality is not changed. Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/vas.h | 3 + arch/powerpc/platforms/book3s/vas-api.c| 146 ++- arch/powerpc/platforms/powernv/vas-fault.c | 155 ++--- 3 files changed, 157 insertions(+), 147 deletions(-) diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h index 2daaa1a2a9a9..66bf8fb1a1be 100644 --- a/arch/powerpc/include/asm/vas.h +++ b/arch/powerpc/include/asm/vas.h @@ -210,4 +210,7 @@ int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type, void vas_unregister_coproc_api(void); int vas_reference_task(struct vas_win_task *vtask); +void vas_update_csb(struct coprocessor_request_block *crb, + struct vas_win_task *vtask); +void vas_dump_crb(struct coprocessor_request_block *crb); #endif /* __ASM_POWERPC_VAS_H */ diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c index d98caa734154..dc131b2e4acd 100644 --- a/arch/powerpc/platforms/book3s/vas-api.c +++ b/arch/powerpc/platforms/book3s/vas-api.c @@ -111,6 +111,150 @@ int vas_reference_task(struct vas_win_task *vtask) return 0; } +/* + * Update the CSB to indicate a translation error. + * + * User space will be polling on CSB after the request is issued. + * If NX can handle the request without any issues, it updates CSB. + * Whereas if NX encounters page fault, the kernel will handle the + * fault and update CSB with translation error. + * + * If we are unable to update the CSB means copy_to_user failed due to + * invalid csb_addr, send a signal to the process. + */ +void vas_update_csb(struct coprocessor_request_block *crb, + struct vas_win_task *vtask) +{ + struct coprocessor_status_block csb; + struct kernel_siginfo info; + struct task_struct *tsk; + void __user *csb_addr; + struct pid *pid; + int rc; + + /* +* NX user space windows can not be opened for task->mm=NULL +* and faults will not be generated for kernel requests. +*/ + if (WARN_ON_ONCE(!vtask->mm)) + return; + + csb_addr = (void __user *)be64_to_cpu(crb->csb_addr); + + memset(&csb, 0, sizeof(csb)); + csb.cc = CSB_CC_FAULT_ADDRESS; + csb.ce = CSB_CE_TERMINATION; + csb.cs = 0; + csb.count = 0; + + /* +* NX operates and returns in BE format as defined CRB struct. +* So saves fault_storage_addr in BE as NX pastes in FIFO and +* expects user space to convert to CPU format. +*/ + csb.address = crb->stamp.nx.fault_storage_addr; + csb.flags = 0; + + pid = vtask->pid; + tsk = get_pid_task(pid, PIDTYPE_PID); + /* +* Process closes send window after all pending NX requests are +* completed. In multi-thread applications, a child thread can +* open a window and can exit without closing it. May be some +* requests are pending or this window can be used by other +* threads later. We should handle faults if NX encounters +* pages faults on these requests. Update CSB with translation +* error and fault address. If csb_addr passed by user space is +* invalid, send SEGV signal to pid saved in window. If the +* child thread is not running, send the signal to tgid. +* Parent thread (tgid) will close this window upon its exit. +* +* pid and mm references are taken when window is opened by +* process (pid). So tgid is used only when child thread opens +* a window and exits without closing it. +*/ + if (!tsk) { + pid = vtask->tgid; + tsk = get_pid_task(pid, PIDTYPE_PID); + /* +* Parent thread (tgid) will be closing window when it +* exits. So should not get here. +*/ + if (WARN_ON_ONCE(!tsk)) + return; + } + + /* Return if the task is exiting. */ + if (tsk->flags & PF_EXITING) { + put_task_struct(tsk); + return; + } + + kthread_use_mm(vtask->mm); + rc = copy_to_user(csb_addr, &csb, sizeof(csb)); + /* +* User space polls on csb.flags (first byte). So add barrier +* then copy first byte with csb flags update. +*/ + if (!rc) { + csb.flags = CSB_V; + /* Make sure update to csb.flags is visible now */ + smp_mb(); + rc = copy_to_user(csb_addr, &csb, sizeof(u8)); + } + kthread_unuse_mm(vtask->mm); + put_task_struct(tsk); + + /* Success */
[PATCH V3 02/16] powerpc/vas: Move VAS API to common book3s platform
Using the same /dev/crypto/nx-gzip interface for both powerNV and pseries. So this patch creates platforms/book3s/ and moves VAS API to that directory. The actual functionality is not changed. Common interface functions such as open, window open ioctl, mmap and close are moved to arch/powerpc/platforms/book3s/vas-api.c Added hooks to call platform specific code, but the underline powerNV code in these functions is not changed. Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/vas.h| 22 ++- arch/powerpc/platforms/Kconfig| 1 + arch/powerpc/platforms/Makefile | 1 + arch/powerpc/platforms/book3s/Kconfig | 15 + arch/powerpc/platforms/book3s/Makefile| 2 + .../platforms/{powernv => book3s}/vas-api.c | 64 ++ arch/powerpc/platforms/powernv/Kconfig| 14 arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/vas-window.c | 66 +++ 9 files changed, 143 insertions(+), 44 deletions(-) create mode 100644 arch/powerpc/platforms/book3s/Kconfig create mode 100644 arch/powerpc/platforms/book3s/Makefile rename arch/powerpc/platforms/{powernv => book3s}/vas-api.c (83%) diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h index 41f73fae7ab8..6bbade60d8f4 100644 --- a/arch/powerpc/include/asm/vas.h +++ b/arch/powerpc/include/asm/vas.h @@ -5,6 +5,8 @@ #ifndef _ASM_POWERPC_VAS_H #define _ASM_POWERPC_VAS_H +#include + struct vas_window; @@ -48,6 +50,16 @@ enum vas_cop_type { VAS_COP_TYPE_MAX, }; +/* + * User space window operations used for powernv and powerVM + */ +struct vas_user_win_ops { + struct vas_window * (*open_win)(struct vas_tx_win_open_attr *, + enum vas_cop_type); + u64 (*paste_addr)(void *); + int (*close_win)(void *); +}; + /* * Receive window attributes specified by the (in-kernel) owner of window. */ @@ -161,6 +173,9 @@ int vas_copy_crb(void *crb, int offset); * assumed to be true for NX windows. */ int vas_paste_crb(struct vas_window *win, int offset, bool re); +int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type, +const char *name); +void vas_unregister_api_powernv(void); /* * Register / unregister coprocessor type to VAS API which will be exported @@ -170,8 +185,9 @@ int vas_paste_crb(struct vas_window *win, int offset, bool re); * Only NX GZIP coprocessor type is supported now, but this API can be * used for others in future. */ -int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type, -const char *name); -void vas_unregister_api_powernv(void); +int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type, + const char *name, + struct vas_user_win_ops *vops); +void vas_unregister_coproc_api(void); #endif /* __ASM_POWERPC_VAS_H */ diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig index 7a5e8f4541e3..594544a65b02 100644 --- a/arch/powerpc/platforms/Kconfig +++ b/arch/powerpc/platforms/Kconfig @@ -20,6 +20,7 @@ source "arch/powerpc/platforms/embedded6xx/Kconfig" source "arch/powerpc/platforms/44x/Kconfig" source "arch/powerpc/platforms/40x/Kconfig" source "arch/powerpc/platforms/amigaone/Kconfig" +source "arch/powerpc/platforms/book3s/Kconfig" config KVM_GUEST bool "KVM Guest support" diff --git a/arch/powerpc/platforms/Makefile b/arch/powerpc/platforms/Makefile index 143d4417f6cc..0e75d7df387b 100644 --- a/arch/powerpc/platforms/Makefile +++ b/arch/powerpc/platforms/Makefile @@ -22,3 +22,4 @@ obj-$(CONFIG_PPC_CELL)+= cell/ obj-$(CONFIG_PPC_PS3) += ps3/ obj-$(CONFIG_EMBEDDED6xx) += embedded6xx/ obj-$(CONFIG_AMIGAONE) += amigaone/ +obj-$(CONFIG_PPC_BOOK3S) += book3s/ diff --git a/arch/powerpc/platforms/book3s/Kconfig b/arch/powerpc/platforms/book3s/Kconfig new file mode 100644 index ..51e14db83a79 --- /dev/null +++ b/arch/powerpc/platforms/book3s/Kconfig @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0 +config PPC_VAS + bool "IBM Virtual Accelerator Switchboard (VAS)" + depends on PPC_POWERNV && PPC_64K_PAGES + default y + help + This enables support for IBM Virtual Accelerator Switchboard (VAS). + + VAS allows accelerators in co-processors like NX-GZIP and NX-842 + to be accessible to kernel subsystems and user processes. + VAS adapters are found in POWER9 and later based systems. + The user mode NX-GZIP support is added on P9 for powerNV and on + P10 for powerVM. + + If unsure, say "N". diff --git a/arch/powerpc/platforms/book3s/Makefile b/arch/powerpc/platforms/book3s/Makefile new file mode 100644 index ..e790f1910f61 --- /dev/null +++ b/arch/powerpc/platforms/book3s
[V3 PATCH 01/16] powerpc/powernv/vas: Rename register/unregister functions
powerNV and pseries drivers register / unregister to the corresponding VAS code separately. So rename powerNV VAS API register/unregister functions. Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/vas.h | 6 +++--- arch/powerpc/platforms/powernv/vas-api.c | 10 +- drivers/crypto/nx/nx-common-powernv.c| 6 +++--- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h index e33f80b0ea81..41f73fae7ab8 100644 --- a/arch/powerpc/include/asm/vas.h +++ b/arch/powerpc/include/asm/vas.h @@ -170,8 +170,8 @@ int vas_paste_crb(struct vas_window *win, int offset, bool re); * Only NX GZIP coprocessor type is supported now, but this API can be * used for others in future. */ -int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type, - const char *name); -void vas_unregister_coproc_api(void); +int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type, +const char *name); +void vas_unregister_api_powernv(void); #endif /* __ASM_POWERPC_VAS_H */ diff --git a/arch/powerpc/platforms/powernv/vas-api.c b/arch/powerpc/platforms/powernv/vas-api.c index 98ed5d8c5441..72d8ce39e56c 100644 --- a/arch/powerpc/platforms/powernv/vas-api.c +++ b/arch/powerpc/platforms/powernv/vas-api.c @@ -207,8 +207,8 @@ static struct file_operations coproc_fops = { * Supporting only nx-gzip coprocessor type now, but this API code * extended to other coprocessor types later. */ -int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type, - const char *name) +int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type, +const char *name) { int rc = -EINVAL; dev_t devno; @@ -262,9 +262,9 @@ int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type, unregister_chrdev_region(coproc_device.devt, 1); return rc; } -EXPORT_SYMBOL_GPL(vas_register_coproc_api); +EXPORT_SYMBOL_GPL(vas_register_api_powernv); -void vas_unregister_coproc_api(void) +void vas_unregister_api_powernv(void) { dev_t devno; @@ -275,4 +275,4 @@ void vas_unregister_coproc_api(void) class_destroy(coproc_device.class); unregister_chrdev_region(coproc_device.devt, 1); } -EXPORT_SYMBOL_GPL(vas_unregister_coproc_api); +EXPORT_SYMBOL_GPL(vas_unregister_api_powernv); diff --git a/drivers/crypto/nx/nx-common-powernv.c b/drivers/crypto/nx/nx-common-powernv.c index 13c65deda8e9..88d728415bb2 100644 --- a/drivers/crypto/nx/nx-common-powernv.c +++ b/drivers/crypto/nx/nx-common-powernv.c @@ -1090,8 +1090,8 @@ static __init int nx_compress_powernv_init(void) * normal FIFO priority is assigned for userspace. * 842 compression is supported only in kernel. */ - ret = vas_register_coproc_api(THIS_MODULE, VAS_COP_TYPE_GZIP, - "nx-gzip"); + ret = vas_register_api_powernv(THIS_MODULE, VAS_COP_TYPE_GZIP, + "nx-gzip"); /* * GZIP is not supported in kernel right now. @@ -1127,7 +1127,7 @@ static void __exit nx_compress_powernv_exit(void) * use. So delete this API use for GZIP engine. */ if (!nx842_ct) - vas_unregister_coproc_api(); + vas_unregister_api_powernv(); crypto_unregister_alg(&nx842_powernv_alg); -- 2.18.2
[V3 PATCH 00/16] Enable VAS and NX-GZIP support on powerVM
This patch series enables VAS / NX-GZIP on powerVM which allows the user space to do copy/paste with the same existing interface that is available on powerNV. VAS Enablement: - Get all VAS capabilities using H_QUERY_VAS_CAPABILITIES that are available in the hypervisor. These capabilities tells OS which type of features (credit types such as Default and Quality of Service (QoS)). Also gives specific capabilities for each credit type: Maximum window credits, Maximum LPAR credits, Target credits in that parition (varies from max LPAR credits based DLPAR operation), whether supports user mode COPY/PASTE and etc. - Register LPAR VAS operations such as open window. get paste address and close window with the current VAS user space API. - Open window operation - Use H_ALLOCATE_VAS_WINDOW HCALL to open window and H_MODIFY_VAS_WINDOW HCALL to setup the window with LPAR PID and etc. - mmap to paste address returned in H_ALLOCATE_VAS_WINDOW HCALL - To close window, H_DEALLOCATE_VAS_WINDOW HCALL is used to close in the hypervisor. NX Enablement: - Get NX capabilities from the the hypervisor which provides Maximum buffer length in a single GZIP request, recommended minimum compression / decompression lengths. - Register to VAS to enable user space VAS API Main feature differences with powerNV implementation: - Each VAS window will be configured with a number of credits which means that many requests can be issues simultaniously on that window. On powerNV, 1K credits are configured per window. Whereas on powerVM, the hypervisor allows 1 credit per window at present. - The hypervisor introduced 2 different types of credits: Default - Uses normal priority FIFO and Quality of Service (QoS) - Uses high priority FIFO. On powerVM, VAS/NX HW resources are shared across LPARs. The total number of credits available on a system depends on cores configured. We may see more credits are assigned across the system than the NX HW resources can handle. So to avoid NX HW contention, pHyp introduced QoS credits which can be configured by system administration with HMC API. Then the total number of available default credits on LPAR varies based on QoS credits configured. - On powerNV, windows are allocated on a specific VAS instance and the user space can select VAS instance with the open window ioctl. Since VAS instances can be shared across partitions on powerVM, the hypervisor manages window allocations on different VAS instances. So H_ALLOCATE_VAS_WINDOW allows to select by domain indentifiers (H_HOME_NODE_ASSOCIATIVITY values by cpu). By default the hypervisor selects VAS instance closer to CPU resources that the parition uses. So vas_id in ioctl interface is ignored on powerVM except vas_id=-1 which is used to allocate window based on CPU that the process is executing. This option is needed for process affinity to NUMA node. The existing applications that linked with libnxz should work as long as the job request length is restricted to req_max_processed_len. Tested the following patches on P10 successfully with test cases given: https://github.com/libnxz/power-gzip Note: The hypervisor supports user mode NX from p10 onwards. Linux supports user mode VAS/NX on P10 only with radix page tables. Patches 1- 4: Move the code that is needed for both powerNV and powerVM to powerpc book3s platform directory Patch5: Modify vas-window struct to support both and the related changes. Patch 6:Define HCALL and the related VAS/NXGZIP specific structs. Patch 7:Define QoS credit flag in window open ioctl Patch 8:Implement Allocate, Modify and Deallocate HCALLs Patch 9:Retrieve VAS capabilities from the hypervisor Patch 10; Implement window operations and integrate with API Patch 11: Setup IRQ and NX fault handling Patch 12; Add sysfs interface to expose VAS capabilities Patch 13 - 14: Make the code common to add NX-GZIP enablement Patch 15: Get NX capabilities from the hypervisor patch 16; Add sysfs interface to expose NX capabilities Changes in V2: - Rebase on 5.12-rc6 - Moved VAS Kconfig changes to arch/powerpc/platform as suggested by Christophe Leroy - build fix with allyesconfig (reported by kernel test build) Changes in V3: - Rebase on 5.12-rc7 - Moved vas-api.c and VAS Kconfig changes to arch/powerpc/platform/book3s as Michael Ellerman suggested Haren Myneni (16): powerpc/powernv/vas: Rename register/unregister functions powerpc/vas: Make VAS API powerpc platform independent powerpc/vas: Create take/drop task reference functions powerpc/vas: Move update_csb/dump_crb to common book3s platform powerpc/vas: Define and use common vas_window struct powerpc/pseries/vas: Define VAS/NXGZIP HCALLs and structs powerpc/vas: Define QoS credit flag to allocate window powerpc/pseries/VAS: Implement allo
Re: Bogus struct page layout on 32-bit
Hi Ilias, All, On 10/04/2021 11:52, Ilias Apalodimas wrote: +CC Grygorii for the cpsw part as Ivan's email is not valid anymore Thanks for catching this. Interesting indeed... On Sat, 10 Apr 2021 at 09:22, Jesper Dangaard Brouer wrote: On Sat, 10 Apr 2021 03:43:13 +0100 Matthew Wilcox wrote: On Sat, Apr 10, 2021 at 06:45:35AM +0800, kernel test robot wrote: include/linux/mm_types.h:274:1: error: static_assert failed due to requirement '__builtin_offsetof(struct page, lru) == __builtin_offsetof(struct folio, lru)' "offsetof(struct page, lru) == offsetof(struct folio, lru)" FOLIO_MATCH(lru, lru); include/linux/mm_types.h:272:2: note: expanded from macro 'FOLIO_MATCH' static_assert(offsetof(struct page, pg) == offsetof(struct folio, fl)) Well, this is interesting. pahole reports: struct page { long unsigned int flags;/* 0 4 */ /* XXX 4 bytes hole, try to pack */ union { struct { struct list_head lru;/* 8 8 */ ... struct folio { union { struct { long unsigned int flags; /* 0 4 */ struct list_head lru;/* 4 8 */ so this assert has absolutely done its job. But why has this assert triggered? Why is struct page layout not what we thought it was? Turns out it's the dma_addr added in 2019 by commit c25fff7171be ("mm: add dma_addr_t to struct page"). On this particular config, it's 64-bit, and ppc32 requires alignment to 64-bit. So the whole union gets moved out by 4 bytes. Argh, good that you are catching this! Unfortunately, we can't just fix this by putting an 'unsigned long pad' in front of it. It still aligns the entire union to 8 bytes, and then it skips another 4 bytes after the pad. We can fix it like this ... +++ b/include/linux/mm_types.h @@ -96,11 +96,12 @@ struct page { unsigned long private; }; struct {/* page_pool used by netstack */ + unsigned long _page_pool_pad; I'm fine with this pad. Matteo is currently proposing[1] to add a 32-bit value after @dma_addr, and he could use this area instead. [1] https://lore.kernel.org/netdev/20210409223801.104657-3-mcr...@linux.microsoft.com/ When adding/changing this, we need to make sure that it doesn't overlap member @index, because network stack use/check page_is_pfmemalloc(). As far as my calculations this is safe to add. I always try to keep an eye out for this, but I wonder if we could have a build check like yours. /** * @dma_addr: might require a 64-bit value even on * 32-bit architectures. */ - dma_addr_t dma_addr; + dma_addr_t dma_addr __packed; }; struct {/* slab, slob and slub */ union { but I don't know if GCC is smart enough to realise that dma_addr is now on an 8 byte boundary and it can use a normal instruction to access it, or whether it'll do something daft like use byte loads to access it. We could also do: + dma_addr_t dma_addr __packed __aligned(sizeof(void *)); and I see pahole, at least sees this correctly: struct { long unsigned int _page_pool_pad; /* 4 4 */ dma_addr_t dma_addr __attribute__((__aligned__(4))); /* 8 8 */ } __attribute__((__packed__)) __attribute__((__aligned__(4))); This presumably affects any 32-bit architecture with a 64-bit phys_addr_t / dma_addr_t. Advice, please? I'm not sure that the 32-bit behavior is with 64-bit (dma) addrs. I don't have any 32-bit boards with 64-bit DMA. Cc. Ivan, wasn't your board (572x ?) 32-bit with driver 'cpsw' this case (where Ivan added XDP+page_pool) ? Sry, for delayed reply. The TI platforms am3/4/5 (cpsw) and Keystone 2 (netcp) can do only 32bit DMA even in case of LPAE (dma-ranges are used). Originally, as I remember, CONFIG_ARCH_DMA_ADDR_T_64BIT has not been selected for the LPAE case on TI platforms and the fact that it became set is the result of multi-paltform/allXXXconfig/DMA optimizations and unification. (just checked - not set in 4.14) Probable commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config symbol in lib/Kconfig"). The TI drivers have been updated, finally to accept ARCH_DMA_ADDR_T_64BIT=y by using things like (__force u32) for example. Honestly, I've done sanity check of CPSW with LPAE=y (ARCH_DMA_ADDR_T_64BIT=y) very long time ago. -- Best regards, grygorii
Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
On Sat, Apr 17, 2021 at 09:32:06PM +0300, Ilias Apalodimas wrote: > > +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t > > addr) > > +{ > > + page->dma_addr[0] = addr; > > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > > + page->dma_addr[1] = addr >> 16 >> 16; > > The 'error' that was reported will never trigger right? > I assume this was compiled with dma_addr_t as 32bits (so it triggered the > compilation error), but the if check will never allow this codepath to run. > If so can we add a comment explaining this, since none of us will remember why > in 6 months from now? That's right. I compiled it all three ways -- 32-bit, 64-bit dma, 32-bit long and 64-bit. The 32/64 bit case turn into: if (0) page->dma_addr[1] = addr >> 16 >> 16; which gets elided. So the only case that has to work is 64-bit dma and 32-bit long. I can replace this with upper_32_bits().
PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr
Hi, kernel test robot reports: >> drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of >> function 'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration] enable_kernel_fp(); ^ when # CONFIG_PPC_FPU is not set CONFIG_ALTIVEC=y I see at least one other place that does not handle that combination well, here: ../arch/powerpc/lib/sstep.c: In function 'do_vec_load': ../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 'put_vr' [-Werror=implicit-function-declaration] 637 | put_vr(rn, &u.v); | ^~ ../arch/powerpc/lib/sstep.c: In function 'do_vec_store': ../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration] 660 | get_vr(rn, &u.v); | ^~ Should the code + Kconfigs/Makefiles handle that kind of kernel config or should ALTIVEC always mean PPC_FPU as well? I have patches to fix the build errors with the config as reported but I don't know if that's the right thing to do... thanks. -- ~Randy
Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
Hi Matthew, On Sat, Apr 17, 2021 at 03:45:22AM +0100, Matthew Wilcox wrote: > > Replacement patch to fix compiler warning. > > From: "Matthew Wilcox (Oracle)" > Date: Fri, 16 Apr 2021 16:34:55 -0400 > Subject: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems > To: bro...@redhat.com > Cc: linux-ker...@vger.kernel.org, > linux...@kvack.org, > net...@vger.kernel.org, > linuxppc-dev@lists.ozlabs.org, > linux-arm-ker...@lists.infradead.org, > linux-m...@vger.kernel.org, > ilias.apalodi...@linaro.org, > mcr...@linux.microsoft.com, > grygorii.stras...@ti.com, > a...@kernel.org, > h...@lst.de, > linux-snps-...@lists.infradead.org, > mho...@kernel.org, > mgor...@suse.de > > 32-bit architectures which expect 8-byte alignment for 8-byte integers > and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct > page inadvertently expanded in 2019. When the dma_addr_t was added, > it forced the alignment of the union to 8 bytes, which inserted a 4 byte > gap between 'flags' and the union. > > Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. > This restores the alignment to that of an unsigned long, and also fixes a > potential problem where (on a big endian platform), the bit used to denote > PageTail could inadvertently get set, and a racing get_user_pages_fast() > could dereference a bogus compound_head(). > > Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") > Signed-off-by: Matthew Wilcox (Oracle) > --- > include/linux/mm_types.h | 4 ++-- > include/net/page_pool.h | 12 +++- > net/core/page_pool.c | 12 +++- > 3 files changed, 20 insertions(+), 8 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 6613b26a8894..5aacc1c10a45 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -97,10 +97,10 @@ struct page { > }; > struct {/* page_pool used by netstack */ > /** > - * @dma_addr: might require a 64-bit value even on > + * @dma_addr: might require a 64-bit value on >* 32-bit architectures. >*/ > - dma_addr_t dma_addr; > + unsigned long dma_addr[2]; > }; > struct {/* slab, slob and slub */ > union { > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > index b5b195305346..ad6154dc206c 100644 > --- a/include/net/page_pool.h > +++ b/include/net/page_pool.h > @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct > page_pool *pool, > > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - return page->dma_addr; > + dma_addr_t ret = page->dma_addr[0]; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16; > + return ret; > +} > + > +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) > +{ > + page->dma_addr[0] = addr; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + page->dma_addr[1] = addr >> 16 >> 16; The 'error' that was reported will never trigger right? I assume this was compiled with dma_addr_t as 32bits (so it triggered the compilation error), but the if check will never allow this codepath to run. If so can we add a comment explaining this, since none of us will remember why in 6 months from now? > } > > static inline bool is_page_pool_compiled_in(void) > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index ad8b0707af04..f014fd8c19a6 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct > page_pool *pool, > struct page *page, > unsigned int dma_sync_size) > { > + dma_addr_t dma_addr = page_pool_get_dma_addr(page); > + > dma_sync_size = min(dma_sync_size, pool->p.max_len); > - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr, > + dma_sync_single_range_for_device(pool->p.dev, dma_addr, >pool->p.offset, dma_sync_size, >pool->p.dma_dir); > } > @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct > page_pool *pool, > put_page(page); > return NULL; > } > - page->dma_addr = dma; > + page_pool_set_dma_addr(page, dma); > > if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) > page_pool_dma_sync_for_device(pool, page, pool->p.max_len); > @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, > struct page *page) >*/ > goto skip_dma_unmap; > > - dma = page->dma_addr; > + dma = page_po
Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
On Sat, Apr 17, 2021 at 3:58 PM Matthew Wilcox wrote: > I wouldn't like to make that assumption. I've come across IOMMUs (maybe > on parisc? powerpc?) that like to encode fun information in the top > few bits. So we could get it down to 52 bits, but I don't think we can > get all the way down to 32 bits. Also, we need to keep the bottom bit > clear for PageTail, so that further constrains us. I'd be surprised to find such an IOMMU on a 32-bit machine, given that the main reason for using an IOMMU on these is to avoid the 32-bit address limit in DMA masters. I see that parisc32 does not enable 64-bit dma_addr_t, while powerpc32 does not support any IOMMU, so it wouldn't be either of those two. I do remember some powerpc systems that encode additional flags (transaction ordering, caching, ...) into the high bits of the physical address in the IOTLB, but not the virtual address used for looking them up. > Anyway, I like the "two unsigned longs" approach I posted yesterday, > but thanks for the suggestion. Ok, fair enough. As long as there are enough bits in this branch of 'struct page', I suppose it is the safe choice. Arnd
Re: swiotlb cleanups v3
> Hi Konrad, > > this series contains a bunch of swiotlb cleanups, mostly to reduce the > amount of internals exposed to code outside of swiotlb.c, which should > helper to prepare for supporting multiple different bounce buffer pools. Somewhere between the 1st and 2nd patch, specifying a specific swiotlb for an SEV guest is no longer honored. For example, if I start an SEV guest with 16GB of memory and specify swiotlb=131072 I used to get a 256MB SWIOTLB. However, after the 2nd patch, the swiotlb=131072 is no longer honored and I get a 982MB SWIOTLB (as set via sev_setup_arch() in arch/x86/mm/mem_encrypt.c). I can't be sure which patch caused the issue since an SEV guest fails to boot with the 1st patch but can boot with the 2nd patch, at which point the SWIOTLB comes in at 982MB (I haven't had a chance to debug it and so I'm hoping you might be able to quickly spot what's going on). Thanks, Tom > > Changes since v2: > - fix a bisetion hazard that did not allocate the alloc_size array > - dropped all patches already merged > > Changes since v1: > - rebased to v5.12-rc1 > - a few more cleanups > - merge and forward port the patch from Claire to move all the global >variables into a struct to prepare for multiple instances
Re: [PATCH v2] tools: do not include scripts/Kbuild.include
On Fri, Apr 16, 2021 at 10:01 PM Masahiro Yamada wrote: > > Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to > scripts/Makefile.compiler"), some kselftests fail to build. > > The tools/ directory opted out Kbuild, and went in a different > direction. They copy any kind of files to the tools/ directory > in order to do whatever they want in their world. > > tools/build/Build.include mimics scripts/Kbuild.include, but some > tool Makefiles included the Kbuild one to import a feature that is > missing in tools/build/Build.include: > > - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector" >only if supported") included scripts/Kbuild.include from >tools/thermal/tmon/Makefile to import the cc-option macro. > > - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do >not support -no-pie") included scripts/Kbuild.include from >tools/testing/selftests/kvm/Makefile to import the try-run macro. > > - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang >failures") included scripts/Kbuild.include from >tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR >target. > > - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for >unrecognized option") included scripts/Kbuild.include from >tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the >try-run macro. > > Copy what they need into tools/build/Build.include, and make them > include it instead of scripts/Kbuild.include. > > Link: > https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/ > Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to > scripts/Makefile.compiler") > Reported-by: Janosch Frank > Reported-by: Christian Borntraeger > Signed-off-by: Masahiro Yamada Applied to linux-kbuild. > --- > > Changes in v2: > - copy macros to tools/build/BUild.include > > tools/build/Build.include | 24 +++ > tools/testing/selftests/bpf/Makefile | 2 +- > tools/testing/selftests/kvm/Makefile | 2 +- > .../selftests/powerpc/pmu/ebb/Makefile| 2 +- > tools/thermal/tmon/Makefile | 2 +- > 5 files changed, 28 insertions(+), 4 deletions(-) > > diff --git a/tools/build/Build.include b/tools/build/Build.include > index 585486e40995..2cf3b1bde86e 100644 > --- a/tools/build/Build.include > +++ b/tools/build/Build.include > @@ -100,3 +100,27 @@ cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) > -D"BUILD_STR(s)=\#s" $(CXX > ## HOSTCC C flags > > host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(KBUILD_HOSTCFLAGS) > -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj)) > + > +# output directory for tests below > +TMPOUT = .tmp_ > + > +# try-run > +# Usage: option = $(call try-run, $(CC)...-o "$$TMP",option-ok,otherwise) > +# Exit code chooses option. "$$TMP" serves as a temporary file and is > +# automatically cleaned up. > +try-run = $(shell set -e; \ > + TMP=$(TMPOUT)/tmp; \ > + mkdir -p $(TMPOUT); \ > + trap "rm -rf $(TMPOUT)" EXIT; \ > + if ($(1)) >/dev/null 2>&1; \ > + then echo "$(2)"; \ > + else echo "$(3)"; \ > + fi) > + > +# cc-option > +# Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586) > +cc-option = $(call try-run, \ > + $(CC) -Werror $(1) -c -x c /dev/null -o "$$TMP",$(1),$(2)) > + > +# delete partially updated (i.e. corrupted) files on error > +.DELETE_ON_ERROR: > diff --git a/tools/testing/selftests/bpf/Makefile > b/tools/testing/selftests/bpf/Makefile > index 044bfdcf5b74..17a5cdf48d37 100644 > --- a/tools/testing/selftests/bpf/Makefile > +++ b/tools/testing/selftests/bpf/Makefile > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0 > -include ../../../../scripts/Kbuild.include > +include ../../../build/Build.include > include ../../../scripts/Makefile.arch > include ../../../scripts/Makefile.include > > diff --git a/tools/testing/selftests/kvm/Makefile > b/tools/testing/selftests/kvm/Makefile > index a6d61f451f88..5ef141f265bd 100644 > --- a/tools/testing/selftests/kvm/Makefile > +++ b/tools/testing/selftests/kvm/Makefile > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0-only > -include ../../../../scripts/Kbuild.include > +include ../../../build/Build.include > > all: > > diff --git a/tools/testing/selftests/powerpc/pmu/ebb/Makefile > b/tools/testing/selftests/powerpc/pmu/ebb/Makefile > index af3df79d8163..c5ecb4634094 100644 > --- a/tools/testing/selftests/powerpc/pmu/ebb/Makefile > +++ b/tools/testing/selftests/powerpc/pmu/ebb/Makefile > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0 > -include ../../../../../../scripts/Kbuild.include > +include ../../../../../build/Build.include > > noarg: > $(MAKE) -C ../../ > diff --git a/tools/thermal/tmon/Makefile b/tools/thermal/tmon/Makefile > index 59e417ec3e13..9db867df7679 100644 > ---
Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
On Sat, Apr 17, 2021 at 12:31:37PM +0200, Arnd Bergmann wrote: > On Fri, Apr 16, 2021 at 5:27 PM Matthew Wilcox wrote: > > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > > index b5b195305346..db7c7020746a 100644 > > --- a/include/net/page_pool.h > > +++ b/include/net/page_pool.h > > @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct > > page_pool *pool, > > > > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > > { > > - return page->dma_addr; > > + dma_addr_t ret = page->dma_addr[0]; > > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > > + ret |= (dma_addr_t)page->dma_addr[1] << 32; > > + return ret; > > +} > > Have you considered using a PFN type address here? I suspect you > can prove that shifting the DMA address by PAGE_BITS would > make it fit into an 'unsigned long' on all 32-bit architectures with > 64-bit dma_addr_t. This requires that page->dma_addr to be > page aligned, as well as fit into 44 bits. I recently went through the > maximum address space per architecture to define a > MAX_POSSIBLE_PHYSMEM_BITS, and none of them have more than > 40 here, presumably the same is true for dma address space. I wouldn't like to make that assumption. I've come across IOMMUs (maybe on parisc? powerpc?) that like to encode fun information in the top few bits. So we could get it down to 52 bits, but I don't think we can get all the way down to 32 bits. Also, we need to keep the bottom bit clear for PageTail, so that further constrains us. Anyway, I like the "two unsigned longs" approach I posted yesterday, but thanks for the suggestion.
Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode
Le 16/04/2021 à 01:49, Alexei Starovoitov a écrit : On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet wrote: 2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann On 4/15/21 11:32 AM, Jianlin Lv wrote: For debugging JITs, dumping the JITed image to kernel log is discouraged, "bpftool prog dump jited" is much better way to examine JITed dumps. This patch get rid of the code related to bpf_jit_enable=2 mode and update the proc handler of bpf_jit_enable, also added auxiliary information to explain how to use bpf_jit_disasm tool after this change. Signed-off-by: Jianlin Lv Hello, For what it's worth, I have already seen people dump the JIT image in kernel logs in Qemu VMs running with just a busybox, not for kernel development, but in a context where buiding/using bpftool was not possible. If building/using bpftool is not possible then majority of selftests won't be exercised. I don't think such environment is suitable for any kind of bpf development. Much so for JIT debugging. While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers. I'd rather nuke that code instead of carrying it from kernel to kernel. When I implemented JIT for PPC32, it was extremely helpfull. As far as I understand, for the time being bpftool is not usable in my environment because it doesn't support cross compilation when the target's endianess differs from the building host endianess, see discussion at https://lore.kernel.org/bpf/21e66a09-514f-f426-b9e2-13baab0b9...@csgroup.eu/ That's right that selftests can't be exercised because they don't build. The question might be candid as I didn't investigate much about the replacement of "bpf_jit_enable=2 debugging mode" by bpftool, how do we use bpftool exactly for that ? Especially when using the BPF test module ?
RE: Bogus struct page layout on 32-bit
From: Grygorii Strashko > Sent: 16 April 2021 10:27 ... > Sry, for delayed reply. > > The TI platforms am3/4/5 (cpsw) and Keystone 2 (netcp) can do only 32bit DMA > even in case of LPAE > (dma-ranges are used). > Originally, as I remember, CONFIG_ARCH_DMA_ADDR_T_64BIT has not been selected > for the LPAE case > on TI platforms and the fact that it became set is the result of > multi-paltform/allXXXconfig/DMA > optimizations and unification. > (just checked - not set in 4.14) > > Probable commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config > symbol in lib/Kconfig"). > > The TI drivers have been updated, finally to accept ARCH_DMA_ADDR_T_64BIT=y > by using things like > (__force u32) > for example. Hmmm using (__force u32) is probably wrong. If an address +length >= 2**32 can get passed then the IO request needs to be errored (or a bounce buffer used). Otherwise you can get particularly horrid corruptions. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] powerpc/pseries/mce: Fix a typo in error type assignment
Ganesh Goudar writes: > The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE. Do you mean "is ICACHE not DCACHE" ? cheers > Signed-off-by: Ganesh Goudar > --- > arch/powerpc/platforms/pseries/ras.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/ras.c > b/arch/powerpc/platforms/pseries/ras.c > index f8b390a9d9fb..9d4ef65da7f3 100644 > --- a/arch/powerpc/platforms/pseries/ras.c > +++ b/arch/powerpc/platforms/pseries/ras.c > @@ -699,7 +699,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs, > mce_err.error_type = MCE_ERROR_TYPE_DCACHE; > break; > case MC_ERROR_TYPE_I_CACHE: > - mce_err.error_type = MCE_ERROR_TYPE_DCACHE; > + mce_err.error_type = MCE_ERROR_TYPE_ICACHE; > break; > case MC_ERROR_TYPE_UNKNOWN: > default: > -- > 2.26.2
Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus
Tyrel Datwyler writes: > On 4/1/21 5:13 PM, Tyrel Datwyler wrote: >> Currently, neither the vio_bus or vio_driver structures provide support >> for a shutdown() routine. >> >> Add support for shutdown() by allowing drivers to provide a >> implementation via function pointer in their vio_driver struct and >> provide a proper implementation in the driver template for the vio_bus >> that calls a vio drivers shutdown() if defined. >> >> In the case that no shutdown() is defined by a vio driver and a kexec is >> in progress we implement a big hammer that calls remove() to ensure no >> further DMA for the devices is possible. >> >> Signed-off-by: Tyrel Datwyler >> --- > > Ping... any comments, problems with this approach? The kexec part seems like a bit of a hack. It also doesn't help for kdump, when none of the shutdown code is run. How many drivers do we have? Can we just implement a proper shutdown for them? cheers
RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
From: Matthew Wilcox > Sent: 16 April 2021 16:28 > > On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote: > > See below patch. Where I swap32 the dma address to satisfy > > page->compound having bit zero cleared. (It is the simplest fix I could > > come up with). > > I think this is slightly simpler, and as a bonus code that assumes the > old layout won't compile. Always a good plan. ... > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - return page->dma_addr; > + dma_addr_t ret = page->dma_addr[0]; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + ret |= (dma_addr_t)page->dma_addr[1] << 32; > + return ret; > +} Won't some compiler/option combinations generate an error for the '<< 32' when dma_addr_t is 32bit? You might need to use a (u64) cast. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
On Fri, Apr 16, 2021 at 5:27 PM Matthew Wilcox wrote: > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > index b5b195305346..db7c7020746a 100644 > --- a/include/net/page_pool.h > +++ b/include/net/page_pool.h > @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct > page_pool *pool, > > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - return page->dma_addr; > + dma_addr_t ret = page->dma_addr[0]; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + ret |= (dma_addr_t)page->dma_addr[1] << 32; > + return ret; > +} Have you considered using a PFN type address here? I suspect you can prove that shifting the DMA address by PAGE_BITS would make it fit into an 'unsigned long' on all 32-bit architectures with 64-bit dma_addr_t. This requires that page->dma_addr to be page aligned, as well as fit into 44 bits. I recently went through the maximum address space per architecture to define a MAX_POSSIBLE_PHYSMEM_BITS, and none of them have more than 40 here, presumably the same is true for dma address space. Arnd
[PATCH] perf vendor events: Initial json/events list for power10 platform
Patch adds initial json/events for POWER10. Signed-off-by: Kajol Jain --- .../perf/pmu-events/arch/powerpc/mapfile.csv | 1 + .../arch/powerpc/power10/cache.json | 47 +++ .../arch/powerpc/power10/floating_point.json | 7 + .../arch/powerpc/power10/frontend.json| 217 + .../arch/powerpc/power10/locks.json | 12 + .../arch/powerpc/power10/marked.json | 147 + .../arch/powerpc/power10/memory.json | 192 +++ .../arch/powerpc/power10/others.json | 297 ++ .../arch/powerpc/power10/pipeline.json| 297 ++ .../pmu-events/arch/powerpc/power10/pmc.json | 22 ++ .../arch/powerpc/power10/translation.json | 57 11 files changed, 1296 insertions(+) create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/cache.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/floating_point.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/frontend.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/locks.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/marked.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/memory.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/others.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pipeline.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pmc.json create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/translation.json diff --git a/tools/perf/pmu-events/arch/powerpc/mapfile.csv b/tools/perf/pmu-events/arch/powerpc/mapfile.csv index 229150e7ab7d..4abdfc3f9692 100644 --- a/tools/perf/pmu-events/arch/powerpc/mapfile.csv +++ b/tools/perf/pmu-events/arch/powerpc/mapfile.csv @@ -15,3 +15,4 @@ # Power8 entries 004[bcd][[:xdigit:]]{4},1,power8,core 004e[[:xdigit:]]{4},1,power9,core +0080[[:xdigit:]]{4},1,power10,core diff --git a/tools/perf/pmu-events/arch/powerpc/power10/cache.json b/tools/perf/pmu-events/arch/powerpc/power10/cache.json new file mode 100644 index ..95e33531fbc6 --- /dev/null +++ b/tools/perf/pmu-events/arch/powerpc/power10/cache.json @@ -0,0 +1,47 @@ +[ + { +"EventCode": "1003C", +"EventName": "PM_EXEC_STALL_DMISS_L2L3", +"BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from either the local L2 or local L3." + }, + { +"EventCode": "34056", +"EventName": "PM_EXEC_STALL_LOAD_FINISH", +"BriefDescription": "Cycles in which the oldest instruction in the pipeline was finishing a load after its data was reloaded from a data source beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles in which the NTF instruction merged with another load in the LMQ." + }, + { +"EventCode": "3006C", +"EventName": "PM_RUN_CYC_SMT2_MODE", +"BriefDescription": "Cycles when this thread's run latch is set and the core is in SMT2 mode" + }, + { +"EventCode": "300F4", +"EventName": "PM_RUN_INST_CMPL_CONC", +"BriefDescription": "PowerPC instructions completed by this thread when all threads in the core had the run-latch set" + }, + { +"EventCode": "4C016", +"EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT", +"BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local L2 or local L3, with a dispatch conflict." + }, + { +"EventCode": "4D014", +"EventName": "PM_EXEC_STALL_LOAD", +"BriefDescription": "Cycles in which the oldest instruction in the pipeline was a load instruction executing in the Load Store Unit." + }, + { +"EventCode": "4D016", +"EventName": "PM_EXEC_STALL_PTESYNC", +"BriefDescription": "Cycles in which the oldest instruction in the pipeline was a PTESYNC instruction executing in the Load Store Unit." + }, + { +"EventCode": "401EA", +"EventName": "PM_THRESH_EXC_128", +"BriefDescription": "Threshold counter exceeded a value of 128" + }, + { +"EventCode": "400F6", +"EventName": "PM_BR_MPRED_CMPL", +"BriefDescription": "A mispredicted branch completed. Includes direction and target." + } +] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json new file mode 100644 index ..e9b92f282d3c --- /dev/null +++ b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json @@ -0,0 +1,7 @@ +[ + { +"EventCode": "4016E", +"EventName": "PM_THRESH_NOT_MET", +"BriefDescription": "Threshold counter did not meet threshold" + } +] diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json new file mode 100644 index ..aebaf94bfdfe --- /dev/null +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json @@ -0,0 +1,217 @@ +[ +
Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
On Sat, 17 Apr 2021 00:07:23 +0100 "Matthew Wilcox (Oracle)" wrote: > 32-bit architectures which expect 8-byte alignment for 8-byte integers > and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct > page inadvertently expanded in 2019. When the dma_addr_t was added, > it forced the alignment of the union to 8 bytes, which inserted a 4 byte > gap between 'flags' and the union. > > Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. > This restores the alignment to that of an unsigned long, and also fixes a > potential problem where (on a big endian platform), the bit used to denote > PageTail could inadvertently get set, and a racing get_user_pages_fast() > could dereference a bogus compound_head(). > > Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") > Signed-off-by: Matthew Wilcox (Oracle) > --- Acked-by: Jesper Dangaard Brouer Thanks you Matthew for working on a fix for this. It's been a pleasure working with you and exchanging crazy ideas with you for solving this. Most of them didn't work out, especially those that came to me during restless nights ;-). Having worked through the other solutions, some very intrusive and some could even be consider ugly. I think we have a good and non-intrusive solution/workaround in this patch. Thanks! > include/linux/mm_types.h | 4 ++-- > include/net/page_pool.h | 12 +++- > net/core/page_pool.c | 12 +++- > 3 files changed, 20 insertions(+), 8 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 6613b26a8894..5aacc1c10a45 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -97,10 +97,10 @@ struct page { > }; > struct {/* page_pool used by netstack */ > /** > - * @dma_addr: might require a 64-bit value even on > + * @dma_addr: might require a 64-bit value on >* 32-bit architectures. >*/ > - dma_addr_t dma_addr; > + unsigned long dma_addr[2]; > }; > struct {/* slab, slob and slub */ > union { > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > index b5b195305346..db7c7020746a 100644 > --- a/include/net/page_pool.h > +++ b/include/net/page_pool.h > @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct > page_pool *pool, > > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - return page->dma_addr; > + dma_addr_t ret = page->dma_addr[0]; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + ret |= (dma_addr_t)page->dma_addr[1] << 32; > + return ret; > +} > + > +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) > +{ > + page->dma_addr[0] = addr; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + page->dma_addr[1] = addr >> 32; > } > > static inline bool is_page_pool_compiled_in(void) > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index ad8b0707af04..f014fd8c19a6 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct > page_pool *pool, > struct page *page, > unsigned int dma_sync_size) > { > + dma_addr_t dma_addr = page_pool_get_dma_addr(page); > + > dma_sync_size = min(dma_sync_size, pool->p.max_len); > - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr, > + dma_sync_single_range_for_device(pool->p.dev, dma_addr, >pool->p.offset, dma_sync_size, >pool->p.dma_dir); > } > @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct > page_pool *pool, > put_page(page); > return NULL; > } > - page->dma_addr = dma; > + page_pool_set_dma_addr(page, dma); > > if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) > page_pool_dma_sync_for_device(pool, page, pool->p.max_len); > @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, > struct page *page) >*/ > goto skip_dma_unmap; > > - dma = page->dma_addr; > + dma = page_pool_get_dma_addr(page); > > - /* When page is unmapped, it cannot be returned our pool */ > + /* When page is unmapped, it cannot be returned to our pool */ > dma_unmap_page_attrs(pool->p.dev, dma, >PAGE_SIZE << pool->p.order, pool->p.dma_dir, >DMA_ATTR_SKIP_CPU_SYNC); > - page->dma_addr = 0; > + page_pool_set_dma_addr(page, 0); > skip_dma_unmap: > /* This may be the last page returned, releasing the pool, so >* it is not safe to re