Re: mmu.c:undefined reference to `patch__hash_page_A0'

2021-04-17 Thread Randy Dunlap
HI--

I no longer see this build error.
However:

On 2/27/21 2:24 AM, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   3fb6d0e00efc958d01c2f109c8453033a2d96796
> commit: 259149cf7c3c6195e6199e045ca988c31d081cab powerpc/32s: Only build hash 
> code when CONFIG_PPC_BOOK3S_604 is selected
> date:   4 weeks ago
> config: powerpc64-randconfig-r013-20210227 (attached as .config)

ktr/lkp, this is a PPC32 .config file that is attached, not PPC64.

Also:

> compiler: powerpc-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=259149cf7c3c6195e6199e045ca988c31d081cab
> git remote add linus 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> git fetch --no-tags linus master
> git checkout 259149cf7c3c6195e6199e045ca988c31d081cab
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
> ARCH=powerpc64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All errors (new ones prefixed by >>):
> 
>powerpc-linux-ld: arch/powerpc/mm/book3s32/mmu.o: in function 
> `MMU_init_hw_patch':
>>> mmu.c:(.init.text+0x75e): undefined reference to `patch__hash_page_A0'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x76a): undefined reference to 
>>> `patch__hash_page_A0'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x776): undefined reference to 
>>> `patch__hash_page_A1'
>powerpc-linux-ld: mmu.c:(.init.text+0x782): undefined reference to 
> `patch__hash_page_A1'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x78e): undefined reference to 
>>> `patch__hash_page_A2'
>powerpc-linux-ld: mmu.c:(.init.text+0x79a): undefined reference to 
> `patch__hash_page_A2'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x7aa): undefined reference to 
>>> `patch__hash_page_B'
>powerpc-linux-ld: mmu.c:(.init.text+0x7b6): undefined reference to 
> `patch__hash_page_B'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x7c2): undefined reference to 
>>> `patch__hash_page_C'
>powerpc-linux-ld: mmu.c:(.init.text+0x7ce): undefined reference to 
> `patch__hash_page_C'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x7da): undefined reference to 
>>> `patch__flush_hash_A0'
>powerpc-linux-ld: mmu.c:(.init.text+0x7e6): undefined reference to 
> `patch__flush_hash_A0'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x7f2): undefined reference to 
>>> `patch__flush_hash_A1'
>powerpc-linux-ld: mmu.c:(.init.text+0x7fe): undefined reference to 
> `patch__flush_hash_A1'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x80a): undefined reference to 
>>> `patch__flush_hash_A2'
>powerpc-linux-ld: mmu.c:(.init.text+0x816): undefined reference to 
> `patch__flush_hash_A2'
>>> powerpc-linux-ld: mmu.c:(.init.text+0x83e): undefined reference to 
>>> `patch__flush_hash_B'
>powerpc-linux-ld: mmu.c:(.init.text+0x84e): undefined reference to 
> `patch__flush_hash_B'
>powerpc-linux-ld: arch/powerpc/mm/book3s32/mmu.o: in function 
> `update_mmu_cache':
>>> mmu.c:(.text.update_mmu_cache+0xa0): undefined reference to `add_hash_page'

I do see this build error:

powerpc-linux-ld: arch/powerpc/boot/wrapper.a(decompress.o): in function 
`partial_decompress':
decompress.c:(.text+0x1f0): undefined reference to `__decompress'

when either
CONFIG_KERNEL_LZO=y
or
CONFIG_KERNEL_LZMA=y

but the build succeeds when either
CONFIG_KERNEL_GZIP=y
or
CONFIG_KERNEL_XZ=y

I guess that is due to arch/powerpc/boot/decompress.c doing this:

#ifdef CONFIG_KERNEL_GZIP
#   include "decompress_inflate.c"
#endif

#ifdef CONFIG_KERNEL_XZ
#   include "xz_config.h"
#   include "../../../lib/decompress_unxz.c"
#endif


It would be nice to require one of KERNEL_GZIP or KERNEL_XZ
to be set/enabled (maybe unless a uImage is being built?).

ta.
-- 
~Randy



Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread Matthew Wilcox
On Sat, Apr 17, 2021 at 09:18:57PM +, David Laight wrote:
> Ugly as well.

Thank you for expressing your opinion.  Again.


Re: [PATCH 2/2] mm: Indicate pfmemalloc pages in compound_head

2021-04-17 Thread Matthew Wilcox
On Sat, Apr 17, 2021 at 09:13:45PM +, David Laight wrote:
> > struct {/* page_pool used by netstack */
> > -   /**
> > -* @dma_addr: might require a 64-bit value on
> > -* 32-bit architectures.
> > -*/
> > +   unsigned long pp_magic;
> > +   unsigned long xmi;
> > +   unsigned long _pp_mapping_pad;
> > unsigned long dma_addr[2];
> > };
> 
> You've deleted the comment.

Yes.  It no longer added any value.  You can see dma_addr now occupies
two words.

> I also think there should be a comment that dma_addr[0]
> must be aliased to ->index.

That's not a requirement.  Moving the pfmemalloc indicator is a
requirement so that we _can_ use index, but there's no requirement about
how index is used.


Re: swiotlb cleanups v3

2021-04-17 Thread Tom Lendacky
On 4/17/21 11:39 AM, Tom Lendacky wrote:
>> Hi Konrad,
>>
>> this series contains a bunch of swiotlb cleanups, mostly to reduce the
>> amount of internals exposed to code outside of swiotlb.c, which should
>> helper to prepare for supporting multiple different bounce buffer pools.
> 
> Somewhere between the 1st and 2nd patch, specifying a specific swiotlb
> for an SEV guest is no longer honored. For example, if I start an SEV
> guest with 16GB of memory and specify swiotlb=131072 I used to get a
> 256MB SWIOTLB. However, after the 2nd patch, the swiotlb=131072 is no
> longer honored and I get a 982MB SWIOTLB (as set via sev_setup_arch() in
> arch/x86/mm/mem_encrypt.c).
> 
> I can't be sure which patch caused the issue since an SEV guest fails to
> boot with the 1st patch but can boot with the 2nd patch, at which point
> the SWIOTLB comes in at 982MB (I haven't had a chance to debug it and so
> I'm hoping you might be able to quickly spot what's going on).

Ok, I figured out the 1st patch boot issue (which is gone when the
second patch is applied). Here's the issue if anyone is interested:

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d9c097f0f78c..dbe369674afe 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -226,7 +226,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
 
alloc_size = PAGE_ALIGN(mem->nslabs * sizeof(size_t));
mem->alloc_size = memblock_alloc(alloc_size, PAGE_SIZE);
-   if (mem->alloc_size)
+   if (!mem->alloc_size)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
 

The 1st patch still allowed the command line specified size of 256MB
SWIOTLB. So that means the 2nd patch causes the command line specified
256MB SWIOTLB size to be ignored and results in a 982MB SWIOTLB size
for the 16GB guest.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> Changes since v2:
>>  - fix a bisetion hazard that did not allocate the alloc_size array
>>  - dropped all patches already merged
>>
>> Changes since v1:
>>  - rebased to v5.12-rc1
>>  - a few more cleanups
>>  - merge and forward port the patch from Claire to move all the global
>>variables into a struct to prepare for multiple instances
> 


RE: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread David Laight
From: Matthew Wilcox 
> Sent: 17 April 2021 03:45
> 
> Replacement patch to fix compiler warning.
...
>  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  {
> - return page->dma_addr;
> + dma_addr_t ret = page->dma_addr[0];
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;

Ugly as well.

Why not just replace the (dma_addr_t) cast with a (u64) one?
Looks better than the double shift.

Same could be done for the '>> 32'.
Is there an upper_32_bits() that could be used??

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



RE: [PATCH 2/2] mm: Indicate pfmemalloc pages in compound_head

2021-04-17 Thread David Laight
From: Matthew Wilcox (Oracle) 
> Sent: 17 April 2021 00:07
> 
> The net page_pool wants to use a magic value to identify page pool pages.
> The best place to put it is in the first word where it can be clearly a
> non-pointer value.  That means shifting dma_addr up to alias with ->index,
> which means we need to find another way to indicate page_is_pfmemalloc().
> Since page_pool doesn't want to set its magic value on pages which are
> pfmemalloc, we can use bit 1 of compound_head to indicate that the page
> came from the memory reserves.
> 
...
>   struct {/* page_pool used by netstack */
> - /**
> -  * @dma_addr: might require a 64-bit value on
> -  * 32-bit architectures.
> -  */
> + unsigned long pp_magic;
> + unsigned long xmi;
> + unsigned long _pp_mapping_pad;
>   unsigned long dma_addr[2];
>   };

You've deleted the comment.

I also think there should be a comment that dma_addr[0]
must be aliased to ->index.
(Or whatever all the exact requirements are.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



[V3 PATCH 16/16] crypto/nx: Add sysfs interface to export NX capabilities

2021-04-17 Thread Haren Myneni


Changes to export the following NXGZIP capabilities through sysfs:

/sys/devices/vio/ibm,compression-v1/NxGzCaps:
min_compress_len  /*Recommended minimum compress length in bytes*/
min_decompress_len /*Recommended minimum decompress length in bytes*/
req_max_processed_len /* Maximum number of bytes processed in one
request */

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-common-pseries.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/drivers/crypto/nx/nx-common-pseries.c 
b/drivers/crypto/nx/nx-common-pseries.c
index 49224870d05e..cc258d2c6475 100644
--- a/drivers/crypto/nx/nx-common-pseries.c
+++ b/drivers/crypto/nx/nx-common-pseries.c
@@ -962,6 +962,36 @@ static struct attribute_group nx842_attribute_group = {
.attrs = nx842_sysfs_entries,
 };
 
+#definenxct_capab_read(_name)  
\
+static ssize_t nxct_##_name##_show(struct device *dev, \
+   struct device_attribute *attr, char *buf)   \
+{  \
+   return sprintf(buf, "%lld\n", nx_ct_capab._name);   \
+}
+
+#define NXCT_ATTR_RO(_name)\
+   nxct_capab_read(_name); \
+   static struct device_attribute dev_attr_##_name = __ATTR(_name, \
+   0444,   \
+   nxct_##_name##_show,\
+   NULL);
+
+NXCT_ATTR_RO(req_max_processed_len);
+NXCT_ATTR_RO(min_compress_len);
+NXCT_ATTR_RO(min_decompress_len);
+
+static struct attribute *nxct_capab_sysfs_entries[] = {
+   &dev_attr_req_max_processed_len.attr,
+   &dev_attr_min_compress_len.attr,
+   &dev_attr_min_decompress_len.attr,
+   NULL,
+};
+
+static struct attribute_group nxct_capab_attr_group = {
+   .name   =   nx_ct_capab.name,
+   .attrs  =   nxct_capab_sysfs_entries,
+};
+
 static struct nx842_driver nx842_pseries_driver = {
.name = KBUILD_MODNAME,
.owner =THIS_MODULE,
@@ -1051,6 +1081,16 @@ static int nx842_probe(struct vio_dev *viodev,
goto error;
}
 
+   if (capab_feat) {
+   if (sysfs_create_group(&viodev->dev.kobj,
+   &nxct_capab_attr_group)) {
+   dev_err(&viodev->dev,
+   "Could not create sysfs NX capability 
entries\n");
+   ret = -1;
+   goto error;
+   }
+   }
+
return 0;
 
 error_unlock:
@@ -1070,6 +1110,9 @@ static void nx842_remove(struct vio_dev *viodev)
pr_info("Removing IBM Power 842 compression device\n");
sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group);
 
+   if (capab_feat)
+   sysfs_remove_group(&viodev->dev.kobj, &nxct_capab_attr_group);
+
crypto_unregister_alg(&nx842_pseries_alg);
 
spin_lock_irqsave(&devdata_mutex, flags);
-- 
2.18.2




[V3 PATCH 15/16] crypto/nx: Get NX capabilities for GZIP coprocessor type

2021-04-17 Thread Haren Myneni


phyp provides NX capabilities which gives recommended minimum
compression / decompression length and maximum request buffer size
in bytes.

Changes to get NX overall capabilities which points to the specific
features phyp supports. Then retrieve NXGZIP specific capabilities.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-common-pseries.c | 83 +++
 1 file changed, 83 insertions(+)

diff --git a/drivers/crypto/nx/nx-common-pseries.c 
b/drivers/crypto/nx/nx-common-pseries.c
index 9a40fca8a9e6..49224870d05e 100644
--- a/drivers/crypto/nx/nx-common-pseries.c
+++ b/drivers/crypto/nx/nx-common-pseries.c
@@ -9,6 +9,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include "nx-842.h"
@@ -20,6 +21,24 @@ MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power 
processors");
 MODULE_ALIAS_CRYPTO("842");
 MODULE_ALIAS_CRYPTO("842-nx");
 
+struct nx_ct_capabs_be {
+   __be64  descriptor;
+   __be64  req_max_processed_len;  /* Max bytes in one GZIP request */
+   __be64  min_compress_len;   /* Min compression size in bytes */
+   __be64  min_decompress_len; /* Min decompression size in bytes */
+} __packed __aligned(0x1000);
+
+struct nx_ct_capabs {
+   charname[VAS_DESCR_LEN + 1];
+   u64 descriptor;
+   u64 req_max_processed_len;  /* Max bytes in one GZIP request */
+   u64 min_compress_len;   /* Min compression in bytes */
+   u64 min_decompress_len; /* Min decompression in bytes */
+};
+
+u64 capab_feat = 0;
+struct nx_ct_capabs nx_ct_capab;
+
 static struct nx842_constraints nx842_pseries_constraints = {
.alignment =DDE_BUFFER_ALIGN,
.multiple = DDE_BUFFER_LAST_MULT,
@@ -1066,6 +1085,66 @@ static void nx842_remove(struct vio_dev *viodev)
kfree(old_devdata);
 }
 
+/*
+ * Get NX capabilities from pHyp.
+ * Only NXGZIP capabilities are available right now and these values
+ * are available through sysfs.
+ */
+static void __init nxct_get_capabilities(void)
+{
+   struct vas_all_capabs_be *capabs_be;
+   struct nx_ct_capabs_be *nxc_be;
+   int rc;
+
+   capabs_be = kmalloc(sizeof(*capabs_be), GFP_KERNEL);
+   if (!capabs_be)
+   return;
+   /*
+* Get NX overall capabilities with feature type=0
+*/
+   rc = plpar_vas_query_capabilities(H_QUERY_NX_CAPABILITIES, 0,
+ (u64)virt_to_phys(capabs_be));
+   if (rc)
+   goto out;
+
+   capab_feat = be64_to_cpu(capabs_be->feat_type);
+   /*
+* NX-GZIP feature available
+*/
+   if (capab_feat & VAS_NX_GZIP_FEAT_BIT) {
+   nxc_be = kmalloc(sizeof(*nxc_be), GFP_KERNEL);
+   if (!nxc_be)
+   goto out;
+   /*
+* Get capabilities for NX-GZIP feature
+*/
+   rc = plpar_vas_query_capabilities(H_QUERY_NX_CAPABILITIES,
+ VAS_NX_GZIP_FEAT,
+ (u64)virt_to_phys(nxc_be));
+   } else {
+   pr_err("NX-GZIP feature is not available\n");
+   rc = -EINVAL;
+   }
+
+   if (!rc) {
+   snprintf(nx_ct_capab.name, VAS_DESCR_LEN + 1, "%.8s",
+(char *)&nxc_be->descriptor);
+   nx_ct_capab.descriptor = be64_to_cpu(nxc_be->descriptor);
+   nx_ct_capab.req_max_processed_len =
+   be64_to_cpu(nxc_be->req_max_processed_len);
+   nx_ct_capab.min_compress_len =
+   be64_to_cpu(nxc_be->min_compress_len);
+   nx_ct_capab.min_decompress_len =
+   be64_to_cpu(nxc_be->min_decompress_len);
+   } else {
+   capab_feat = 0;
+   }
+
+   kfree(nxc_be);
+out:
+   kfree(capabs_be);
+}
+
 static const struct vio_device_id nx842_vio_driver_ids[] = {
{"ibm,compression-v1", "ibm,compression"},
{"", ""},
@@ -1093,6 +1172,10 @@ static int __init nx842_pseries_init(void)
return -ENOMEM;
 
RCU_INIT_POINTER(devdata, new_devdata);
+   /*
+* Get NX capabilities from pHyp which is used for NX-GZIP.
+*/
+   nxct_get_capabilities();
 
ret = vio_register_driver(&nx842_vio_driver);
if (ret) {
-- 
2.18.2




[V3 PATCH 14/16] crypto/nx: Register and unregister VAS interface

2021-04-17 Thread Haren Myneni


Changes to create /dev/crypto/nx-gzip interface with VAS register
and to remove this interface with VAS unregister.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/Kconfig | 1 +
 drivers/crypto/nx/nx-common-pseries.c | 9 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index 23e3d0160e67..2a35e0e785bd 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -29,6 +29,7 @@ if CRYPTO_DEV_NX_COMPRESS
 config CRYPTO_DEV_NX_COMPRESS_PSERIES
tristate "Compression acceleration support on pSeries platform"
depends on PPC_PSERIES && IBMVIO
+   depends on PPC_VAS
default y
help
  Support for PowerPC Nest (NX) compression acceleration. This
diff --git a/drivers/crypto/nx/nx-common-pseries.c 
b/drivers/crypto/nx/nx-common-pseries.c
index cc8dd3072b8b..9a40fca8a9e6 100644
--- a/drivers/crypto/nx/nx-common-pseries.c
+++ b/drivers/crypto/nx/nx-common-pseries.c
@@ -9,6 +9,7 @@
  */
 
 #include 
+#include 
 
 #include "nx-842.h"
 #include "nx_csbcpb.h" /* struct nx_csbcpb */
@@ -1101,6 +1102,12 @@ static int __init nx842_pseries_init(void)
return ret;
}
 
+   ret = vas_register_api_pseries(THIS_MODULE, VAS_COP_TYPE_GZIP,
+  "nx-gzip");
+
+   if (ret)
+   pr_err("NX-GZIP is not supported. Returned=%d\n", ret);
+
return 0;
 }
 
@@ -,6 +1118,8 @@ static void __exit nx842_pseries_exit(void)
struct nx842_devdata *old_devdata;
unsigned long flags;
 
+   vas_unregister_api_pseries();
+
crypto_unregister_alg(&nx842_pseries_alg);
 
spin_lock_irqsave(&devdata_mutex, flags);
-- 
2.18.2




[V3 PATCH 13/16] crypto/nx: Rename nx-842-pseries file name to nx-common-pseries

2021-04-17 Thread Haren Myneni


Rename nx-842-pseries.c to nx-common-pseries.c to add code for new
GZIP compression type. The actual functionality is not changed in
this patch.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/Makefile  | 2 +-
 drivers/crypto/nx/{nx-842-pseries.c => nx-common-pseries.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename drivers/crypto/nx/{nx-842-pseries.c => nx-common-pseries.c} (100%)

diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index bc89a20e5d9d..d00181a26dd6 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -14,5 +14,5 @@ nx-crypto-objs := nx.o \
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_PSERIES) += nx-compress-pseries.o 
nx-compress.o
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV) += nx-compress-powernv.o 
nx-compress.o
 nx-compress-objs := nx-842.o
-nx-compress-pseries-objs := nx-842-pseries.o
+nx-compress-pseries-objs := nx-common-pseries.o
 nx-compress-powernv-objs := nx-common-powernv.o
diff --git a/drivers/crypto/nx/nx-842-pseries.c 
b/drivers/crypto/nx/nx-common-pseries.c
similarity index 100%
rename from drivers/crypto/nx/nx-842-pseries.c
rename to drivers/crypto/nx/nx-common-pseries.c
-- 
2.18.2




[V3 PATCH 12/16] powerpc/pseries/vas: sysfs interface to export capabilities

2021-04-17 Thread Haren Myneni


pHyp provides GZIP default and GZIP QoS capabilities which gives
the total number of credits are available in LPAR. This patch
creates sysfs entries and exports LPAR credits, the currently used
and the available credits for each feature.

/sys/kernel/vas/VasCaps/VDefGzip: (default GZIP capabilities)
avail_lpar_creds /* Available credits to use */
target_lpar_creds /* Total credits available which can be
 /* changed with DLPAR operation */
used_lpar_creds  /* Used credits */

/sys/kernel/vas/VasCaps/VQosGzip (QoS GZIP capabilities)
avail_lpar_creds
target_lpar_creds
used_lpar_creds

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/Makefile|   2 +-
 arch/powerpc/platforms/pseries/vas-sysfs.c | 173 +
 arch/powerpc/platforms/pseries/vas.c   |   6 +
 arch/powerpc/platforms/pseries/vas.h   |   2 +
 4 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/pseries/vas-sysfs.c

diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..e24093bebc0b 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -30,4 +30,4 @@ obj-$(CONFIG_PPC_SVM) += svm.o
 obj-$(CONFIG_FA_DUMP)  += rtas-fadump.o
 
 obj-$(CONFIG_SUSPEND)  += suspend.o
-obj-$(CONFIG_PPC_VAS)  += vas.o
+obj-$(CONFIG_PPC_VAS)  += vas.o vas-sysfs.o
diff --git a/arch/powerpc/platforms/pseries/vas-sysfs.c 
b/arch/powerpc/platforms/pseries/vas-sysfs.c
new file mode 100644
index ..5f01f8ba6806
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/vas-sysfs.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2016-17 IBM Corp.
+ */
+
+#define pr_fmt(fmt) "vas: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vas.h"
+
+#ifdef CONFIG_SYSFS
+static struct kobject *pseries_vas_kobj;
+static struct kobject *vas_capabs_kobj;
+
+struct vas_capabs_entry {
+   struct kobject kobj;
+   struct vas_ct_capabs *capabs;
+};
+
+#define to_capabs_entry(entry) container_of(entry, struct vas_capabs_entry, 
kobj)
+
+static ssize_t avail_lpar_creds_show(struct vas_ct_capabs *capabs, char *buf)
+{
+   int avail_creds = atomic_read(&capabs->target_lpar_creds) -
+   atomic_read(&capabs->used_lpar_creds);
+   return sprintf(buf, "%d\n", avail_creds);
+}
+
+#define sysfs_capbs_entry_read(_name)  \
+static ssize_t _name##_show(struct vas_ct_capabs *capabs, char *buf)   \
+{  \
+   return sprintf(buf, "%d\n", atomic_read(&capabs->_name));   \
+}
+
+struct vas_sysfs_entry {
+   struct attribute attr;
+   ssize_t (*show)(struct vas_ct_capabs *, char *);
+   ssize_t (*store)(struct vas_ct_capabs *, const char *, size_t);
+};
+
+#define VAS_ATTR_RO(_name) \
+   sysfs_capbs_entry_read(_name);  \
+   static struct vas_sysfs_entry _name##_attribute = __ATTR(_name, \
+   0444, _name##_show, NULL);
+
+VAS_ATTR_RO(target_lpar_creds);
+VAS_ATTR_RO(used_lpar_creds);
+
+static struct vas_sysfs_entry avail_lpar_creds_attribute =
+   __ATTR(avail_lpar_creds, 0444, avail_lpar_creds_show, NULL);
+
+static struct attribute *vas_capab_attrs[] = {
+   &target_lpar_creds_attribute.attr,
+   &used_lpar_creds_attribute.attr,
+   &avail_lpar_creds_attribute.attr,
+   NULL,
+};
+
+static ssize_t vas_type_show(struct kobject *kobj, struct attribute *attr,
+char *buf)
+{
+   struct vas_capabs_entry *centry;
+   struct vas_ct_capabs *capabs;
+   struct vas_sysfs_entry *entry;
+
+   centry = to_capabs_entry(kobj);
+   capabs = centry->capabs;
+   entry = container_of(attr, struct vas_sysfs_entry, attr);
+
+   if (!entry->show)
+   return -EIO;
+
+   return entry->show(capabs, buf);
+}
+
+static ssize_t vas_type_store(struct kobject *kobj, struct attribute *attr,
+ const char *buf, size_t count)
+{
+   struct vas_capabs_entry *centry;
+   struct vas_ct_capabs *capabs;
+   struct vas_sysfs_entry *entry;
+
+   centry = to_capabs_entry(kobj);
+   capabs = centry->capabs;
+   entry = container_of(attr, struct vas_sysfs_entry, attr);
+   if (!entry->store)
+   return -EIO;
+
+   return entry->store(capabs, buf, count);
+}
+
+static void vas_type_release(struct kobject *kobj)
+{
+   struct vas_capabs_entry *centry = to_capabs_entry(kobj);
+   kfree(centry);
+}
+
+static const struct sysfs_ops vas_sysfs_ops = {
+   .show   =   vas_type_show,
+   .store  =   vas_type_store,
+};
+
+static struct kobj_type vas_attr_type = {
+   .release=   vas_type_release,
+

[V3 PATCH 11/16] powerpc/pseries/vas: Setup IRQ and fault handling

2021-04-17 Thread Haren Myneni


When NX sees a fault on the user space buffer, generates a fault
interrupt and pHyp forwards that interrupt to OS. Then the kernel
makes H_GET_NX_FAULT HCALL to retrieve the fault CRB information.

This patch adds changes to setup IRQ per each window and handles
fault by updating CSB.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/vas.c | 111 ++-
 1 file changed, 110 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 0ade0d6d728f..2106eca0862a 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -224,6 +224,62 @@ int plpar_vas_query_capabilities(const u64 hcall, u8 
query_type,
 }
 EXPORT_SYMBOL_GPL(plpar_vas_query_capabilities);
 
+/*
+ * HCALL to get fault CRB from pHyp.
+ */
+static int plpar_get_nx_fault(u32 winid, u64 buffer)
+{
+   int64_t rc;
+
+   rc = plpar_hcall_norets(H_GET_NX_FAULT, winid, buffer);
+
+   switch (rc) {
+   case H_SUCCESS:
+   return 0;
+   case H_PARAMETER:
+   pr_err("HCALL(%x): Invalid window ID %u\n", H_GET_NX_FAULT,
+  winid);
+   return -EINVAL;
+   case H_STATE:
+   pr_err("HCALL(%x): No outstanding faults for window ID %u\n",
+  H_GET_NX_FAULT, winid);
+   return -EINVAL;
+   case H_PRIVILEGE:
+   pr_err("HCALL(%x): Window(%u): Invalid fault buffer 0x%llx\n",
+  H_GET_NX_FAULT, winid, buffer);
+   return -EACCES;
+   default:
+   pr_err("HCALL(%x): Unexpected error %lld for window(%u)\n",
+  H_GET_NX_FAULT, rc, winid);
+   return -EIO;
+   }
+}
+
+/*
+ * Handle the fault interrupt.
+ * When the fault interrupt is received for each window, query pHyp to get
+ * the fault CRB on the specific fault. Then process the CRB by updating
+ * CSB or send signal if the user space CSB is invalid.
+ * Note: pHyp forwards an interrupt for each fault request. So one fault
+ * CRB to process for each H_GET_NX_FAULT HCALL.
+ */
+irqreturn_t pseries_vas_fault_thread_fn(int irq, void *data)
+{
+   struct vas_window *txwin = data;
+   struct coprocessor_request_block crb;
+   struct vas_win_task *tsk;
+   int rc;
+
+   rc = plpar_get_nx_fault(txwin->winid, (u64)virt_to_phys(&crb));
+   if (!rc) {
+   tsk = &txwin->task;
+   vas_dump_crb(&crb);
+   vas_update_csb(&crb, tsk);
+   }
+
+   return IRQ_HANDLED;
+}
+
 /*
  * Allocate window and setup IRQ mapping.
  */
@@ -235,10 +291,51 @@ static int allocate_setup_window(struct vas_window *txwin,
rc = plpar_vas_allocate_window(txwin, domain, wintype, DEF_WIN_CREDS);
if (rc)
return rc;
+   /*
+* On powerVM, pHyp setup and forwards the fault interrupt per
+* window. So the IRQ setup and fault handling will be done for
+* each open window separately.
+*/
+   txwin->lpar.fault_virq = irq_create_mapping(NULL,
+   txwin->lpar.fault_irq);
+   if (!txwin->lpar.fault_virq) {
+   pr_err("Failed irq mapping %d\n", txwin->lpar.fault_irq);
+   rc = -EINVAL;
+   goto out_win;
+   }
+
+   txwin->lpar.name = kasprintf(GFP_KERNEL, "vas-win-%d", txwin->winid);
+   if (!txwin->lpar.name) {
+   rc = -ENOMEM;
+   goto out_irq;
+   }
+
+   rc = request_threaded_irq(txwin->lpar.fault_virq, NULL,
+ pseries_vas_fault_thread_fn, IRQF_ONESHOT,
+ txwin->lpar.name, txwin);
+   if (rc) {
+   pr_err("VAS-Window[%d]: Request IRQ(%u) failed with %d\n",
+  txwin->winid, txwin->lpar.fault_virq, rc);
+   goto out_free;
+   }
 
txwin->wcreds_max = DEF_WIN_CREDS;
 
return 0;
+out_free:
+   kfree(txwin->lpar.name);
+out_irq:
+   irq_dispose_mapping(txwin->lpar.fault_virq);
+out_win:
+   plpar_vas_deallocate_window(txwin->winid);
+   return rc;
+}
+
+static inline void free_irq_setup(struct vas_window *txwin)
+{
+   free_irq(txwin->lpar.fault_virq, txwin);
+   irq_dispose_mapping(txwin->lpar.fault_virq);
+   kfree(txwin->lpar.name);
 }
 
 static struct vas_window *vas_allocate_window(struct vas_tx_win_open_attr 
*uattr,
@@ -346,6 +443,11 @@ static struct vas_window *vas_allocate_window(struct 
vas_tx_win_open_attr *uattr
return txwin;
 
 out_free:
+   /*
+* Window is not operational. Free IRQ before closing
+* window so that do not have to hold mutex.
+*/
+   free_irq_setup(txwin);
plpar_vas_deallocate_window(txwin->winid);
 out:
atomic_dec(&ct_capab->used_lpar_creds);
@@ -364,9 +466,16 @@ static int deallocate_free_window(s

[V3 PATCH 10/16] powerpc/pseries/vas: Integrate API with open/close windows

2021-04-17 Thread Haren Myneni


This patch adds VAS window allocatioa/close with the corresponding
HCALLs. Also changes to integrate with the existing user space VAS
API and provide register/unregister functions to NX pseries driver.

The driver register function is used to create the user space
interface (/dev/crypto/nx-gzip) and unregister to remove this entry.

The user space process opens this device node and makes an ioctl
to allocate VAS window. The close interface is used to deallocate
window.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/vas.h  |   5 +
 arch/powerpc/platforms/book3s/Kconfig   |   2 +-
 arch/powerpc/platforms/pseries/Makefile |   1 +
 arch/powerpc/platforms/pseries/vas.c| 212 
 4 files changed, 219 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index d15784506a54..aa1974aba27e 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -270,6 +270,11 @@ struct vas_all_capabs {
u64 feat_type;
 };
 
+int plpar_vas_query_capabilities(const u64 hcall, u8 query_type,
+u64 result);
+int vas_register_api_pseries(struct module *mod,
+enum vas_cop_type cop_type, const char *name);
+void vas_unregister_api_pseries(void);
 #endif
 
 /*
diff --git a/arch/powerpc/platforms/book3s/Kconfig 
b/arch/powerpc/platforms/book3s/Kconfig
index 51e14db83a79..bed21449e8e5 100644
--- a/arch/powerpc/platforms/book3s/Kconfig
+++ b/arch/powerpc/platforms/book3s/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 config PPC_VAS
bool "IBM Virtual Accelerator Switchboard (VAS)"
-   depends on PPC_POWERNV && PPC_64K_PAGES
+   depends on (PPC_POWERNV || PPC_PSERIES) && PPC_64K_PAGES
default y
help
  This enables support for IBM Virtual Accelerator Switchboard (VAS).
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index c8a2b0b05ac0..4cda0ef87be0 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_PPC_SVM) += svm.o
 obj-$(CONFIG_FA_DUMP)  += rtas-fadump.o
 
 obj-$(CONFIG_SUSPEND)  += suspend.o
+obj-$(CONFIG_PPC_VAS)  += vas.o
diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 35946fb02995..0ade0d6d728f 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -222,6 +222,218 @@ int plpar_vas_query_capabilities(const u64 hcall, u8 
query_type,
return -EIO;
}
 }
+EXPORT_SYMBOL_GPL(plpar_vas_query_capabilities);
+
+/*
+ * Allocate window and setup IRQ mapping.
+ */
+static int allocate_setup_window(struct vas_window *txwin,
+u64 *domain, u8 wintype)
+{
+   int rc;
+
+   rc = plpar_vas_allocate_window(txwin, domain, wintype, DEF_WIN_CREDS);
+   if (rc)
+   return rc;
+
+   txwin->wcreds_max = DEF_WIN_CREDS;
+
+   return 0;
+}
+
+static struct vas_window *vas_allocate_window(struct vas_tx_win_open_attr 
*uattr,
+ enum vas_cop_type cop_type)
+{
+   long domain[PLPAR_HCALL9_BUFSIZE] = {VAS_DEFAULT_DOMAIN_ID};
+   struct vas_ct_capabs *ct_capab;
+   struct vas_capabs *capabs;
+   struct vas_window *txwin;
+   int rc;
+
+   txwin = kzalloc(sizeof(*txwin), GFP_KERNEL);
+   if (!txwin)
+   return ERR_PTR(-ENOMEM);
+
+   /*
+* A VAS window can have many credits which means that many
+* requests can be issued simultaneously. But phyp restricts
+* one credit per window.
+* phyp introduces 2 different types of credits:
+* Default credit type (Uses normal priority FIFO):
+*  A limited number of credits are assigned to partitions
+*  based on processor entitlement. But these credits may be
+*  over-committed on a system depends on whether the CPUs
+*  are in shared or dedicated modes - that is, more requests
+*  may be issued across the system than NX can service at
+*  once which can result in paste command failure (RMA_busy).
+*  Then the process has to resend requests or fall-back to
+*  SW compression.
+* Quality of Service (QoS) credit type (Uses high priority FIFO):
+*  To avoid NX HW contention, the system admins can assign
+*  QoS credits for each LPAR so that this partition is
+*  guaranteed access to NX resources. These credits are
+*  assigned to partitions via the HMC.
+*  Refer PAPR for more information.
+*
+* Allocate window with QoS credits if user requested. Otherwise
+* default credits are used.
+*/
+   if (uattr->flags & VAS_WIN_QOS_CREDITS)
+

[V3 PATCH 09/16] powerpc/pseries/vas: Implement to get all capabilities

2021-04-17 Thread Haren Myneni


pHyp provides various VAS capabilities such as GZIP default and QoS
capabilities which are used to determine total number of credits
available in LPAR, maximum window credits, maximum LPAR credits,
whether usermode copy/paste is supported, and etc.

So first retrieve overall vas capabilities using
H_QUERY_VAS_CAPABILITIES HCALL which tells the specific features that
are available. Then retrieve the specific capabilities by using the
feature type in H_QUERY_VAS_CAPABILITIES HCALL.

pHyp supports only GZIP default and GZIP QoS capabilities right now.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/vas.c | 130 +++
 1 file changed, 130 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 06960151477c..35946fb02995 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -30,6 +30,13 @@
 /* phyp allows one credit per window right now */
 #define DEF_WIN_CREDS  1
 
+static struct vas_all_capabs capabs_all;
+static int copypaste_feat;
+
+struct vas_capabs vcapabs[VAS_MAX_FEAT_TYPE];
+
+DEFINE_MUTEX(vas_pseries_mutex);
+
 static int64_t hcall_return_busy_check(int64_t rc)
 {
/* Check if we are stalled for some time */
@@ -215,3 +222,126 @@ int plpar_vas_query_capabilities(const u64 hcall, u8 
query_type,
return -EIO;
}
 }
+
+/*
+ * Get the specific capabilities based on the feature type.
+ * Right now supports GZIP default and GZIP QoS capabilities.
+ */
+static int get_vas_capabilities(u8 feat, enum vas_cop_feat_type type,
+   struct vas_ct_capabs_be *capab_be)
+{
+   struct vas_ct_capabs *capab;
+   struct vas_capabs *vcapab;
+   int rc = 0;
+
+   vcapab = &vcapabs[type];
+   memset(vcapab, 0, sizeof(*vcapab));
+   INIT_LIST_HEAD(&vcapab->list);
+
+   capab = &vcapab->capab;
+
+   rc = plpar_vas_query_capabilities(H_QUERY_VAS_CAPABILITIES, feat,
+ (u64)virt_to_phys(capab_be));
+   if (rc)
+   return rc;
+
+   capab->user_mode = capab_be->user_mode;
+   if (!(capab->user_mode & VAS_COPY_PASTE_USER_MODE)) {
+   pr_err("User space COPY/PASTE is not supported\n");
+   return -ENOTSUPP;
+   }
+
+   snprintf(capab->name, VAS_DESCR_LEN + 1, "%.8s",
+(char *)&capab_be->descriptor);
+   capab->descriptor = be64_to_cpu(capab_be->descriptor);
+   capab->win_type = capab_be->win_type;
+   if (capab->win_type >= VAS_MAX_FEAT_TYPE) {
+   pr_err("Unsupported window type %u\n", capab->win_type);
+   return -EINVAL;
+   }
+   capab->max_lpar_creds = be16_to_cpu(capab_be->max_lpar_creds);
+   capab->max_win_creds = be16_to_cpu(capab_be->max_win_creds);
+   atomic_set(&capab->target_lpar_creds,
+  be16_to_cpu(capab_be->target_lpar_creds));
+   if (feat == VAS_GZIP_DEF_FEAT) {
+   capab->def_lpar_creds = be16_to_cpu(capab_be->def_lpar_creds);
+
+   if (capab->max_win_creds < DEF_WIN_CREDS) {
+   pr_err("Window creds(%u) > max allowed window 
creds(%u)\n",
+  DEF_WIN_CREDS, capab->max_win_creds);
+   return -EINVAL;
+   }
+   }
+
+   copypaste_feat = 1;
+
+   return 0;
+}
+
+static int __init pseries_vas_init(void)
+{
+   struct vas_ct_capabs_be *ct_capabs_be;
+   struct vas_all_capabs_be *capabs_be;
+   int rc;
+
+   /*
+* Linux supports user space COPY/PASTE only with Radix
+*/
+   if (!radix_enabled()) {
+   pr_err("API is supported only with radix page tables\n");
+   return -ENOTSUPP;
+   }
+
+   capabs_be = kmalloc(sizeof(*capabs_be), GFP_KERNEL);
+   if (!capabs_be)
+   return -ENOMEM;
+   /*
+* Get VAS overall capabilities by passing 0 to feature type.
+*/
+   rc = plpar_vas_query_capabilities(H_QUERY_VAS_CAPABILITIES, 0,
+ (u64)virt_to_phys(capabs_be));
+   if (rc)
+   goto out;
+
+   snprintf(capabs_all.name, VAS_DESCR_LEN, "%.7s",
+(char *)&capabs_be->descriptor);
+   capabs_all.descriptor = be64_to_cpu(capabs_be->descriptor);
+   capabs_all.feat_type = be64_to_cpu(capabs_be->feat_type);
+
+   ct_capabs_be = kmalloc(sizeof(*ct_capabs_be), GFP_KERNEL);
+   if (!ct_capabs_be) {
+   rc = -ENOMEM;
+   goto out;
+   }
+   /*
+* QOS capabilities available
+*/
+   if (capabs_all.feat_type & VAS_GZIP_QOS_FEAT_BIT) {
+   rc = get_vas_capabilities(VAS_GZIP_QOS_FEAT,
+ VAS_GZIP_QOS_FEAT_TYPE, ct_capabs_be);
+
+   if (rc)
+   goto out_ct;
+   }
+   /*
+* Def

[V3 PATCH 08/16] powerpc/pseries/VAS: Implement allocate/modify/deallocate HCALLS

2021-04-17 Thread Haren Myneni


This patch adds the following HCALLs which are used to allocate,
modify and deallocate VAS windows.

H_ALLOCATE_VAS_WINDOW: Allocate VAS window
H_DEALLOCATE_VAS_WINDOW: Close VAS window
H_MODIFY_VAS_WINDOW: Setup window before using

Also adds phyp call (H_QUERY_VAS_CAPABILITIES) to get all VAS
capabilities that phyp provides.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/vas.c | 217 +++
 1 file changed, 217 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/vas.c

diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
new file mode 100644
index ..06960151477c
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -0,0 +1,217 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2020-21 IBM Corp.
+ */
+
+#define pr_fmt(fmt) "vas: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "vas.h"
+
+#defineVAS_INVALID_WIN_ADDRESS 0xul
+#defineVAS_DEFAULT_DOMAIN_ID   0xul
+/* Authority Mask Register (AMR) value is not supported in */
+/* linux implementation. So pass '0' to modify window HCALL */
+#defineVAS_AMR_VALUE   0
+/* phyp allows one credit per window right now */
+#define DEF_WIN_CREDS  1
+
+static int64_t hcall_return_busy_check(int64_t rc)
+{
+   /* Check if we are stalled for some time */
+   if (H_IS_LONG_BUSY(rc)) {
+   msleep(get_longbusy_msecs(rc));
+   rc = H_BUSY;
+   } else if (rc == H_BUSY) {
+   cond_resched();
+   }
+
+   return rc;
+}
+
+/*
+ * Allocate VAS window HCALL
+ */
+static int plpar_vas_allocate_window(struct vas_window *win, u64 *domain,
+u8 wintype, u16 credits)
+{
+   long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
+   int64_t rc;
+
+   do {
+   rc = plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, wintype,
+ credits, domain[0], domain[1], domain[2],
+ domain[3], domain[4], domain[5]);
+
+   rc = hcall_return_busy_check(rc);
+   } while (rc == H_BUSY);
+
+   switch (rc) {
+   case H_SUCCESS:
+   win->winid = retbuf[0];
+   win->lpar.win_addr = retbuf[1];
+   win->lpar.complete_irq = retbuf[2];
+   win->lpar.fault_irq = retbuf[3];
+   if (win->lpar.win_addr == VAS_INVALID_WIN_ADDRESS) {
+   pr_err("HCALL(%x): COPY/PASTE is not supported\n",
+   H_ALLOCATE_VAS_WINDOW);
+   return -ENOTSUPP;
+   }
+   return 0;
+   case H_PARAMETER:
+   pr_err("HCALL(%x): Invalid window type (%u)\n",
+   H_ALLOCATE_VAS_WINDOW, wintype);
+   return -EINVAL;
+   case H_P2:
+   pr_err("HCALL(%x): Credits(%u) exceed maximum window credits\n",
+   H_ALLOCATE_VAS_WINDOW, credits);
+   return -EINVAL;
+   case H_COP_HW:
+   pr_err("HCALL(%x): User-mode COPY/PASTE is not supported\n",
+   H_ALLOCATE_VAS_WINDOW);
+   return -ENOTSUPP;
+   case H_RESOURCE:
+   pr_err("HCALL(%x): LPAR credit limit exceeds window limit\n",
+   H_ALLOCATE_VAS_WINDOW);
+   return -EPERM;
+   case H_CONSTRAINED:
+   pr_err("HCALL(%x): Credits (%u) are not available\n",
+   H_ALLOCATE_VAS_WINDOW, credits);
+   return -EPERM;
+   default:
+   pr_err("HCALL(%x): Unexpected error %lld\n",
+   H_ALLOCATE_VAS_WINDOW, rc);
+   return -EIO;
+   }
+}
+
+/*
+ * Deallocate VAS window HCALL.
+ */
+static int plpar_vas_deallocate_window(u64 winid)
+{
+   int64_t rc;
+
+   do {
+   rc = plpar_hcall_norets(H_DEALLOCATE_VAS_WINDOW, winid);
+
+   rc = hcall_return_busy_check(rc);
+   } while (rc == H_BUSY);
+
+   switch (rc) {
+   case H_SUCCESS:
+   return 0;
+   case H_PARAMETER:
+   pr_err("HCALL(%x): Invalid window ID %llu\n",
+   H_DEALLOCATE_VAS_WINDOW, winid);
+   return -EINVAL;
+   case H_STATE:
+   pr_err("HCALL(%x): Window(%llu): Invalid page table entries\n",
+   H_DEALLOCATE_VAS_WINDOW, winid);
+   return -EPERM;
+   default:
+   pr_err("HCALL(%x): Unexpected error %lld for window(%llu)\n",
+   H_DEALLOCATE_VAS_WINDOW, rc, winid);
+   return -EIO;
+   }
+}
+
+/*
+ * Modify VAS window.
+ * After the window is opened with allocate window HCALL, configure it
+ * with flags and LPAR PID before using.

[V3 PATCH 07/16] powerpc/vas: Define QoS credit flag to allocate window

2021-04-17 Thread Haren Myneni


pHyp introduces two different type of credits: Default and Quality
of service (QoS).

The total number of default credits available on each LPAR depends
on CPU resources configured. But these credits can be shared or
over-committed across LPARs in shared mode which can result in
paste command failure (RMA_busy). To avoid NX HW contention, phyp
introduces QoS credit type which makes sure guaranteed access to NX
resources. The system admins can assign QoS credits for each LPAR
via HMC.

Default credit type is used to allocate a VAS window by default as
on powerVM implementation. But the process can pass VAS_WIN_QOS_CREDITS
flag with VAS_TX_WIN_OPEN ioctl to open QoS type window.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/uapi/asm/vas-api.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/uapi/asm/vas-api.h 
b/arch/powerpc/include/uapi/asm/vas-api.h
index ebd4b2424785..eb7c8694174f 100644
--- a/arch/powerpc/include/uapi/asm/vas-api.h
+++ b/arch/powerpc/include/uapi/asm/vas-api.h
@@ -13,11 +13,15 @@
 #define VAS_MAGIC  'v'
 #define VAS_TX_WIN_OPEN_IOW(VAS_MAGIC, 0x20, struct 
vas_tx_win_open_attr)
 
+/* Flags to VAS TX open window ioctl */
+/* To allocate a window with QoS credit, otherwise default credit is used */
+#defineVAS_WIN_QOS_CREDITS 0x0001
+
 struct vas_tx_win_open_attr {
__u32   version;
__s16   vas_id; /* specific instance of vas or -1 for default */
__u16   reserved1;
-   __u64   flags;  /* Future use */
+   __u64   flags;
__u64   reserved2[6];
 };
 
-- 
2.18.2




[V3 PATCH 06/16] powerpc/pseries/vas: Define VAS/NXGZIP HCALLs and structs

2021-04-17 Thread Haren Myneni


This patch adds HCALLs and other definitions. Also define structs
that are used in VAS implementation on powerVM.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/hvcall.h|  7 ++
 arch/powerpc/include/asm/vas.h   | 28 
 arch/powerpc/platforms/pseries/vas.h | 96 
 3 files changed, 131 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/vas.h

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index ed6086d57b22..accbb7f6f272 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -294,6 +294,13 @@
 #define H_RESIZE_HPT_COMMIT0x370
 #define H_REGISTER_PROC_TBL0x37C
 #define H_SIGNAL_SYS_RESET 0x380
+#defineH_ALLOCATE_VAS_WINDOW   0x388
+#defineH_MODIFY_VAS_WINDOW 0x38C
+#defineH_DEALLOCATE_VAS_WINDOW 0x390
+#defineH_QUERY_VAS_WINDOW  0x394
+#defineH_QUERY_VAS_CAPABILITIES0x398
+#defineH_QUERY_NX_CAPABILITIES 0x39C
+#defineH_GET_NX_FAULT  0x3A0
 #define H_INT_GET_SOURCE_INFO   0x3A8
 #define H_INT_SET_SOURCE_CONFIG 0x3AC
 #define H_INT_GET_SOURCE_CONFIG 0x3B0
diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index f928bf4c7e98..d15784506a54 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -179,6 +179,7 @@ struct vas_tx_win_attr {
bool rx_win_ord_mode;
 };
 
+#ifdef CONFIG_PPC_POWERNV
 /*
  * Helper to map a chip id to VAS id.
  * For POWER9, this is a 1:1 mapping. In the future this maybe a 1:N
@@ -243,6 +244,33 @@ int vas_paste_crb(struct vas_window *win, int offset, bool 
re);
 int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type,
 const char *name);
 void vas_unregister_api_powernv(void);
+#endif
+
+#ifdef CONFIG_PPC_PSERIES
+
+/* VAS Capabilities */
+#define VAS_GZIP_QOS_FEAT  0x1
+#define VAS_GZIP_DEF_FEAT  0x2
+#define VAS_GZIP_QOS_FEAT_BIT  (1UL << (63 - VAS_GZIP_QOS_FEAT)) /* Bit 1 */
+#define VAS_GZIP_DEF_FEAT_BIT  (1UL << (63 - VAS_GZIP_DEF_FEAT)) /* Bit 2 */
+
+/* NX Capabilities */
+#defineVAS_NX_GZIP_FEAT0x1
+#defineVAS_NX_GZIP_FEAT_BIT(1UL << (63 - VAS_NX_GZIP_FEAT)) /* Bit 
1 */
+#defineVAS_DESCR_LEN   8
+
+struct vas_all_capabs_be {
+   __be64  descriptor;
+   __be64  feat_type;
+} __packed __aligned(0x1000);
+
+struct vas_all_capabs {
+   charname[VAS_DESCR_LEN + 1];
+   u64 descriptor;
+   u64 feat_type;
+};
+
+#endif
 
 /*
  * Register / unregister coprocessor type to VAS API which will be exported
diff --git a/arch/powerpc/platforms/pseries/vas.h 
b/arch/powerpc/platforms/pseries/vas.h
new file mode 100644
index ..208682fffa57
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/vas.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright 2020-21 IBM Corp.
+ */
+
+#ifndef _VAS_H
+#define _VAS_H
+#include 
+#include 
+#include 
+
+/*
+ * VAS window modify flags
+ */
+#defineVAS_MOD_WIN_CLOSE   (1UL << 63)
+#defineVAS_MOD_WIN_JOBS_KILL   (1UL << (63 - 1))
+#defineVAS_MOD_WIN_DR  (1UL << (63 - 3))
+#defineVAS_MOD_WIN_PR  (1UL << (63 - 4))
+#defineVAS_MOD_WIN_SF  (1UL << (63 - 5))
+#defineVAS_MOD_WIN_TA  (1UL << (63 - 6))
+#defineVAS_MOD_WIN_FLAGS   (VAS_MOD_WIN_JOBS_KILL | VAS_MOD_WIN_DR 
| \
+   VAS_MOD_WIN_PR | VAS_MOD_WIN_SF)
+
+#defineVAS_WIN_ACTIVE  0x0
+#defineVAS_WIN_CLOSED  0x1
+#defineVAS_WIN_INACTIVE0x2 /* Inactive due to HW failure */
+/* Process of being modified, deallocated, or quiesced */
+#defineVAS_WIN_MOD_IN_PROCESS  0x3
+
+#defineVAS_COPY_PASTE_USER_MODE0x0001
+#defineVAS_COP_OP_USER_MODE0x0010
+
+/*
+ * Co-processor feature - GZIP QoS windows or GZIP default windows
+ */
+enum vas_cop_feat_type {
+   VAS_GZIP_QOS_FEAT_TYPE,
+   VAS_GZIP_DEF_FEAT_TYPE,
+   VAS_MAX_FEAT_TYPE,
+};
+
+struct vas_ct_capabs_be {
+   __be64  descriptor;
+   u8  win_type;   /* Default or QoS type */
+   u8  user_mode;
+   __be16  max_lpar_creds;
+   __be16  max_win_creds;
+   union {
+   __be16  reserved;
+   __be16  def_lpar_creds; /* Used for default capabilities */
+   };
+   __be16  target_lpar_creds;
+} __packed __aligned(0x1000);
+
+struct vas_ct_capabs {
+   charname[VAS_DESCR_LEN + 1];
+   u64 descriptor;
+   u8  win_type;   /* Default or QoS type */
+   u8  user_mode;  /* User mode copy/paste or COP HCALL */
+   u16 max_lpar_creds; /* Max credits available in LPAR */
+   /* Max credits can be assigned per win

[V3 PATCH 05/16] powerpc/vas: Define and use common vas_window struct

2021-04-17 Thread Haren Myneni



[V3 PATCH 03/16] powerpc/vas: Create take/drop task reference functions

2021-04-17 Thread Haren Myneni


Take task reference when each window opens and drops during close.
This functionality is needed for powerNV and pseries. So this patch
defines the existing code as functions in common book3s platform
vas-api.c

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/vas.h  | 20 
 arch/powerpc/platforms/book3s/vas-api.c | 51 ++
 arch/powerpc/platforms/powernv/vas-fault.c  | 10 ++--
 arch/powerpc/platforms/powernv/vas-window.c | 57 ++---
 arch/powerpc/platforms/powernv/vas.h|  6 +--
 5 files changed, 83 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index 6bbade60d8f4..2daaa1a2a9a9 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -5,6 +5,9 @@
 
 #ifndef _ASM_POWERPC_VAS_H
 #define _ASM_POWERPC_VAS_H
+#include 
+#include 
+#include 
 #include 
 
 
@@ -60,6 +63,22 @@ struct vas_user_win_ops {
int (*close_win)(void *);
 };
 
+struct vas_win_task {
+   struct pid *pid;/* Thread group ID of owner */
+   struct pid *tgid;   /* Linux process mm_struct */
+   struct mm_struct *mm;   /* Linux process mm_struct */
+};
+
+static inline void vas_drop_reference_task(struct vas_win_task *task)
+{
+   /* Drop references to pid and mm */
+   put_pid(task->pid);
+   if (task->mm) {
+   mm_context_remove_vas_window(task->mm);
+   mmdrop(task->mm);
+   }
+}
+
 /*
  * Receive window attributes specified by the (in-kernel) owner of window.
  */
@@ -190,4 +209,5 @@ int vas_register_coproc_api(struct module *mod, enum 
vas_cop_type cop_type,
struct vas_user_win_ops *vops);
 void vas_unregister_coproc_api(void);
 
+int vas_reference_task(struct vas_win_task *vtask);
 #endif /* __ASM_POWERPC_VAS_H */
diff --git a/arch/powerpc/platforms/book3s/vas-api.c 
b/arch/powerpc/platforms/book3s/vas-api.c
index 05d7b99acf41..d98caa734154 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -60,6 +60,57 @@ static char *coproc_devnode(struct device *dev, umode_t 
*mode)
return kasprintf(GFP_KERNEL, "crypto/%s", dev_name(dev));
 }
 
+/*
+ * Take reference to pid and mm
+ */
+int vas_reference_task(struct vas_win_task *vtask)
+{
+   /*
+* Window opened by a child thread may not be closed when
+* it exits. So take reference to its pid and release it
+* when the window is free by parent thread.
+* Acquire a reference to the task's pid to make sure
+* pid will not be re-used - needed only for multithread
+* applications.
+*/
+   vtask->pid = get_task_pid(current, PIDTYPE_PID);
+   /*
+* Acquire a reference to the task's mm.
+*/
+   vtask->mm = get_task_mm(current);
+   if (!vtask->mm) {
+   put_pid(vtask->pid);
+   pr_err("VAS: pid(%d): mm_struct is not found\n",
+   current->pid);
+   return -EPERM;
+   }
+
+   mmgrab(vtask->mm);
+   mmput(vtask->mm);
+   mm_context_add_vas_window(vtask->mm);
+   /*
+* Process closes window during exit. In the case of
+* multithread application, the child thread can open
+* window and can exit without closing it. Expects parent
+* thread to use and close the window. So do not need
+* to take pid reference for parent thread.
+*/
+   vtask->tgid = find_get_pid(task_tgid_vnr(current));
+   /*
+* Even a process that has no foreign real address mapping can
+* use an unpaired COPY instruction (to no real effect). Issue
+* CP_ABORT to clear any pending COPY and prevent a covert
+* channel.
+*
+* __switch_to() will issue CP_ABORT on future context switches
+* if process / thread has any open VAS window (Use
+* current->mm->context.vas_windows).
+*/
+   asm volatile(PPC_CP_ABORT);
+
+   return 0;
+}
+
 static int coproc_open(struct inode *inode, struct file *fp)
 {
struct coproc_instance *cp_inst;
diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index 3d21fce254b7..a4835cb82c09 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -73,7 +73,7 @@ static void update_csb(struct vas_window *window,
 * NX user space windows can not be opened for task->mm=NULL
 * and faults will not be generated for kernel requests.
 */
-   if (WARN_ON_ONCE(!window->mm || !window->user_win))
+   if (WARN_ON_ONCE(!window->task.mm || !window->user_win))
return;
 
csb_addr = (void __user *)be64_to_cpu(crb->csb_addr);
@@ -92,7 +92,7 @@ static void update_csb(struct vas_window *window,
csb.address = crb->stamp.nx.fault_storage_addr;
csb.flags = 0

[V3 PATCH 04/16] powerpc/vas: Move update_csb/dump_crb to common book3s platform

2021-04-17 Thread Haren Myneni


NX issues an interrupt when sees fault on user space buffer. The
kernel processes the fault by updating CSB. This functionality is
same for both powerNV and pseries. So this patch moves these
functions to common vas-api.c and the actual functionality is not
changed.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/vas.h |   3 +
 arch/powerpc/platforms/book3s/vas-api.c| 146 ++-
 arch/powerpc/platforms/powernv/vas-fault.c | 155 ++---
 3 files changed, 157 insertions(+), 147 deletions(-)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index 2daaa1a2a9a9..66bf8fb1a1be 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -210,4 +210,7 @@ int vas_register_coproc_api(struct module *mod, enum 
vas_cop_type cop_type,
 void vas_unregister_coproc_api(void);
 
 int vas_reference_task(struct vas_win_task *vtask);
+void vas_update_csb(struct coprocessor_request_block *crb,
+   struct vas_win_task *vtask);
+void vas_dump_crb(struct coprocessor_request_block *crb);
 #endif /* __ASM_POWERPC_VAS_H */
diff --git a/arch/powerpc/platforms/book3s/vas-api.c 
b/arch/powerpc/platforms/book3s/vas-api.c
index d98caa734154..dc131b2e4acd 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -111,6 +111,150 @@ int vas_reference_task(struct vas_win_task *vtask)
return 0;
 }
 
+/*
+ * Update the CSB to indicate a translation error.
+ *
+ * User space will be polling on CSB after the request is issued.
+ * If NX can handle the request without any issues, it updates CSB.
+ * Whereas if NX encounters page fault, the kernel will handle the
+ * fault and update CSB with translation error.
+ *
+ * If we are unable to update the CSB means copy_to_user failed due to
+ * invalid csb_addr, send a signal to the process.
+ */
+void vas_update_csb(struct coprocessor_request_block *crb,
+   struct vas_win_task *vtask)
+{
+   struct coprocessor_status_block csb;
+   struct kernel_siginfo info;
+   struct task_struct *tsk;
+   void __user *csb_addr;
+   struct pid *pid;
+   int rc;
+
+   /*
+* NX user space windows can not be opened for task->mm=NULL
+* and faults will not be generated for kernel requests.
+*/
+   if (WARN_ON_ONCE(!vtask->mm))
+   return;
+
+   csb_addr = (void __user *)be64_to_cpu(crb->csb_addr);
+
+   memset(&csb, 0, sizeof(csb));
+   csb.cc = CSB_CC_FAULT_ADDRESS;
+   csb.ce = CSB_CE_TERMINATION;
+   csb.cs = 0;
+   csb.count = 0;
+
+   /*
+* NX operates and returns in BE format as defined CRB struct.
+* So saves fault_storage_addr in BE as NX pastes in FIFO and
+* expects user space to convert to CPU format.
+*/
+   csb.address = crb->stamp.nx.fault_storage_addr;
+   csb.flags = 0;
+
+   pid = vtask->pid;
+   tsk = get_pid_task(pid, PIDTYPE_PID);
+   /*
+* Process closes send window after all pending NX requests are
+* completed. In multi-thread applications, a child thread can
+* open a window and can exit without closing it. May be some
+* requests are pending or this window can be used by other
+* threads later. We should handle faults if NX encounters
+* pages faults on these requests. Update CSB with translation
+* error and fault address. If csb_addr passed by user space is
+* invalid, send SEGV signal to pid saved in window. If the
+* child thread is not running, send the signal to tgid.
+* Parent thread (tgid) will close this window upon its exit.
+*
+* pid and mm references are taken when window is opened by
+* process (pid). So tgid is used only when child thread opens
+* a window and exits without closing it.
+*/
+   if (!tsk) {
+   pid = vtask->tgid;
+   tsk = get_pid_task(pid, PIDTYPE_PID);
+   /*
+* Parent thread (tgid) will be closing window when it
+* exits. So should not get here.
+*/
+   if (WARN_ON_ONCE(!tsk))
+   return;
+   }
+
+   /* Return if the task is exiting. */
+   if (tsk->flags & PF_EXITING) {
+   put_task_struct(tsk);
+   return;
+   }
+
+   kthread_use_mm(vtask->mm);
+   rc = copy_to_user(csb_addr, &csb, sizeof(csb));
+   /*
+* User space polls on csb.flags (first byte). So add barrier
+* then copy first byte with csb flags update.
+*/
+   if (!rc) {
+   csb.flags = CSB_V;
+   /* Make sure update to csb.flags is visible now */
+   smp_mb();
+   rc = copy_to_user(csb_addr, &csb, sizeof(u8));
+   }
+   kthread_unuse_mm(vtask->mm);
+   put_task_struct(tsk);
+
+   /* Success */

[PATCH V3 02/16] powerpc/vas: Move VAS API to common book3s platform

2021-04-17 Thread Haren Myneni


Using the same /dev/crypto/nx-gzip interface for both powerNV and
pseries. So this patch creates platforms/book3s/ and moves VAS API
to that directory. The actual functionality is not changed.

Common interface functions such as open, window open ioctl, mmap
and close are moved to arch/powerpc/platforms/book3s/vas-api.c
Added hooks to call platform specific code, but the underline
powerNV code in these functions is not changed.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/vas.h| 22 ++-
 arch/powerpc/platforms/Kconfig|  1 +
 arch/powerpc/platforms/Makefile   |  1 +
 arch/powerpc/platforms/book3s/Kconfig | 15 +
 arch/powerpc/platforms/book3s/Makefile|  2 +
 .../platforms/{powernv => book3s}/vas-api.c   | 64 ++
 arch/powerpc/platforms/powernv/Kconfig| 14 
 arch/powerpc/platforms/powernv/Makefile   |  2 +-
 arch/powerpc/platforms/powernv/vas-window.c   | 66 +++
 9 files changed, 143 insertions(+), 44 deletions(-)
 create mode 100644 arch/powerpc/platforms/book3s/Kconfig
 create mode 100644 arch/powerpc/platforms/book3s/Makefile
 rename arch/powerpc/platforms/{powernv => book3s}/vas-api.c (83%)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index 41f73fae7ab8..6bbade60d8f4 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -5,6 +5,8 @@
 
 #ifndef _ASM_POWERPC_VAS_H
 #define _ASM_POWERPC_VAS_H
+#include 
+
 
 struct vas_window;
 
@@ -48,6 +50,16 @@ enum vas_cop_type {
VAS_COP_TYPE_MAX,
 };
 
+/*
+ * User space window operations used for powernv and powerVM
+ */
+struct vas_user_win_ops {
+   struct vas_window * (*open_win)(struct vas_tx_win_open_attr *,
+   enum vas_cop_type);
+   u64 (*paste_addr)(void *);
+   int (*close_win)(void *);
+};
+
 /*
  * Receive window attributes specified by the (in-kernel) owner of window.
  */
@@ -161,6 +173,9 @@ int vas_copy_crb(void *crb, int offset);
  * assumed to be true for NX windows.
  */
 int vas_paste_crb(struct vas_window *win, int offset, bool re);
+int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type,
+const char *name);
+void vas_unregister_api_powernv(void);
 
 /*
  * Register / unregister coprocessor type to VAS API which will be exported
@@ -170,8 +185,9 @@ int vas_paste_crb(struct vas_window *win, int offset, bool 
re);
  * Only NX GZIP coprocessor type is supported now, but this API can be
  * used for others in future.
  */
-int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type,
-const char *name);
-void vas_unregister_api_powernv(void);
+int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type,
+   const char *name,
+   struct vas_user_win_ops *vops);
+void vas_unregister_coproc_api(void);
 
 #endif /* __ASM_POWERPC_VAS_H */
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 7a5e8f4541e3..594544a65b02 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -20,6 +20,7 @@ source "arch/powerpc/platforms/embedded6xx/Kconfig"
 source "arch/powerpc/platforms/44x/Kconfig"
 source "arch/powerpc/platforms/40x/Kconfig"
 source "arch/powerpc/platforms/amigaone/Kconfig"
+source "arch/powerpc/platforms/book3s/Kconfig"
 
 config KVM_GUEST
bool "KVM Guest support"
diff --git a/arch/powerpc/platforms/Makefile b/arch/powerpc/platforms/Makefile
index 143d4417f6cc..0e75d7df387b 100644
--- a/arch/powerpc/platforms/Makefile
+++ b/arch/powerpc/platforms/Makefile
@@ -22,3 +22,4 @@ obj-$(CONFIG_PPC_CELL)+= cell/
 obj-$(CONFIG_PPC_PS3)  += ps3/
 obj-$(CONFIG_EMBEDDED6xx)  += embedded6xx/
 obj-$(CONFIG_AMIGAONE) += amigaone/
+obj-$(CONFIG_PPC_BOOK3S)   += book3s/
diff --git a/arch/powerpc/platforms/book3s/Kconfig 
b/arch/powerpc/platforms/book3s/Kconfig
new file mode 100644
index ..51e14db83a79
--- /dev/null
+++ b/arch/powerpc/platforms/book3s/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+config PPC_VAS
+   bool "IBM Virtual Accelerator Switchboard (VAS)"
+   depends on PPC_POWERNV && PPC_64K_PAGES
+   default y
+   help
+ This enables support for IBM Virtual Accelerator Switchboard (VAS).
+
+ VAS allows accelerators in co-processors like NX-GZIP and NX-842
+ to be accessible to kernel subsystems and user processes.
+ VAS adapters are found in POWER9 and later based systems.
+ The user mode NX-GZIP support is added on P9 for powerNV and on
+ P10 for powerVM.
+
+ If unsure, say "N".
diff --git a/arch/powerpc/platforms/book3s/Makefile 
b/arch/powerpc/platforms/book3s/Makefile
new file mode 100644
index ..e790f1910f61
--- /dev/null
+++ b/arch/powerpc/platforms/book3s

[V3 PATCH 01/16] powerpc/powernv/vas: Rename register/unregister functions

2021-04-17 Thread Haren Myneni


powerNV and pseries drivers register / unregister to the corresponding
VAS code separately. So rename powerNV VAS API register/unregister
functions.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/vas.h   |  6 +++---
 arch/powerpc/platforms/powernv/vas-api.c | 10 +-
 drivers/crypto/nx/nx-common-powernv.c|  6 +++---
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index e33f80b0ea81..41f73fae7ab8 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -170,8 +170,8 @@ int vas_paste_crb(struct vas_window *win, int offset, bool 
re);
  * Only NX GZIP coprocessor type is supported now, but this API can be
  * used for others in future.
  */
-int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type,
-   const char *name);
-void vas_unregister_coproc_api(void);
+int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type,
+const char *name);
+void vas_unregister_api_powernv(void);
 
 #endif /* __ASM_POWERPC_VAS_H */
diff --git a/arch/powerpc/platforms/powernv/vas-api.c 
b/arch/powerpc/platforms/powernv/vas-api.c
index 98ed5d8c5441..72d8ce39e56c 100644
--- a/arch/powerpc/platforms/powernv/vas-api.c
+++ b/arch/powerpc/platforms/powernv/vas-api.c
@@ -207,8 +207,8 @@ static struct file_operations coproc_fops = {
  * Supporting only nx-gzip coprocessor type now, but this API code
  * extended to other coprocessor types later.
  */
-int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type,
-   const char *name)
+int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type,
+const char *name)
 {
int rc = -EINVAL;
dev_t devno;
@@ -262,9 +262,9 @@ int vas_register_coproc_api(struct module *mod, enum 
vas_cop_type cop_type,
unregister_chrdev_region(coproc_device.devt, 1);
return rc;
 }
-EXPORT_SYMBOL_GPL(vas_register_coproc_api);
+EXPORT_SYMBOL_GPL(vas_register_api_powernv);
 
-void vas_unregister_coproc_api(void)
+void vas_unregister_api_powernv(void)
 {
dev_t devno;
 
@@ -275,4 +275,4 @@ void vas_unregister_coproc_api(void)
class_destroy(coproc_device.class);
unregister_chrdev_region(coproc_device.devt, 1);
 }
-EXPORT_SYMBOL_GPL(vas_unregister_coproc_api);
+EXPORT_SYMBOL_GPL(vas_unregister_api_powernv);
diff --git a/drivers/crypto/nx/nx-common-powernv.c 
b/drivers/crypto/nx/nx-common-powernv.c
index 13c65deda8e9..88d728415bb2 100644
--- a/drivers/crypto/nx/nx-common-powernv.c
+++ b/drivers/crypto/nx/nx-common-powernv.c
@@ -1090,8 +1090,8 @@ static __init int nx_compress_powernv_init(void)
 * normal FIFO priority is assigned for userspace.
 * 842 compression is supported only in kernel.
 */
-   ret = vas_register_coproc_api(THIS_MODULE, VAS_COP_TYPE_GZIP,
-   "nx-gzip");
+   ret = vas_register_api_powernv(THIS_MODULE, VAS_COP_TYPE_GZIP,
+  "nx-gzip");
 
/*
 * GZIP is not supported in kernel right now.
@@ -1127,7 +1127,7 @@ static void __exit nx_compress_powernv_exit(void)
 * use. So delete this API use for GZIP engine.
 */
if (!nx842_ct)
-   vas_unregister_coproc_api();
+   vas_unregister_api_powernv();
 
crypto_unregister_alg(&nx842_powernv_alg);
 
-- 
2.18.2




[V3 PATCH 00/16] Enable VAS and NX-GZIP support on powerVM

2021-04-17 Thread Haren Myneni


This patch series enables VAS / NX-GZIP on powerVM which allows
the user space to do copy/paste with the same existing interface
that is available on powerNV.

VAS Enablement:
- Get all VAS capabilities using H_QUERY_VAS_CAPABILITIES that are
  available in the hypervisor. These capabilities tells OS which
  type of features (credit types such as Default and Quality of
  Service (QoS)). Also gives specific capabilities for each credit
  type: Maximum window credits, Maximum LPAR credits, Target credits
  in that parition (varies from max LPAR credits based DLPAR
  operation), whether supports user mode COPY/PASTE and etc.
- Register LPAR VAS operations such as open window. get paste
  address and close window with the current VAS user space API.
- Open window operation - Use H_ALLOCATE_VAS_WINDOW HCALL to open
  window and H_MODIFY_VAS_WINDOW HCALL to setup the window with LPAR
  PID and etc.
- mmap to paste address returned in H_ALLOCATE_VAS_WINDOW HCALL
- To close window, H_DEALLOCATE_VAS_WINDOW HCALL is used to close in
  the hypervisor.

NX Enablement:
- Get NX capabilities from the the hypervisor which provides Maximum
  buffer length in a single GZIP request, recommended minimum
  compression / decompression lengths.
- Register to VAS to enable user space VAS API

Main feature differences with powerNV implementation:
- Each VAS window will be configured with a number of credits which
  means that many requests can be issues simultaniously on that
  window. On powerNV, 1K credits are configured per window.
  Whereas on powerVM, the hypervisor allows 1 credit per window
  at present.
- The hypervisor introduced 2 different types of credits: Default -
  Uses normal priority FIFO and Quality of Service (QoS) - Uses high
  priority FIFO. On powerVM, VAS/NX HW resources are shared across
  LPARs. The total number of credits available on a system depends
  on cores configured. We may see more credits are assigned across
  the system than the NX HW resources can handle. So to avoid NX HW
  contention, pHyp introduced QoS credits which can be configured
  by system administration with HMC API. Then the total number of
  available default credits on LPAR varies based on QoS credits
  configured.
- On powerNV, windows are allocated on a specific VAS instance
  and the user space can select VAS instance with the open window
  ioctl. Since VAS instances can be shared across partitions on
  powerVM, the hypervisor manages window allocations on different
  VAS instances. So H_ALLOCATE_VAS_WINDOW allows to select by domain
  indentifiers (H_HOME_NODE_ASSOCIATIVITY values by cpu). By default
  the hypervisor selects VAS instance closer to CPU resources that the
  parition uses. So vas_id in ioctl interface is ignored on powerVM
  except vas_id=-1 which is used to allocate window based on CPU that
  the process is executing. This option is needed for process affinity
  to NUMA node.

  The existing applications that linked with libnxz should work as
  long as the job request length is restricted to
  req_max_processed_len.

  Tested the following patches on P10 successfully with test cases
  given: https://github.com/libnxz/power-gzip

  Note: The hypervisor supports user mode NX from p10 onwards. Linux
supports user mode VAS/NX on P10 only with radix page tables.

Patches 1- 4:   Move the code that is needed for both powerNV and
powerVM to powerpc book3s platform directory
Patch5: Modify vas-window struct to support both and the
related changes.
Patch 6:Define HCALL and the related VAS/NXGZIP specific
structs.
Patch 7:Define QoS credit flag in window open ioctl
Patch 8:Implement Allocate, Modify and Deallocate HCALLs
Patch 9:Retrieve VAS capabilities from the hypervisor
Patch 10;   Implement window operations and integrate with API
Patch 11:   Setup IRQ and NX fault handling
Patch 12;   Add sysfs interface to expose VAS capabilities
Patch 13 - 14:  Make the code common to add NX-GZIP enablement
Patch 15:   Get NX capabilities from the hypervisor
patch 16;   Add sysfs interface to expose NX capabilities

Changes in V2:
  - Rebase on 5.12-rc6
  - Moved VAS Kconfig changes to arch/powerpc/platform as suggested
by Christophe Leroy
  - build fix with allyesconfig (reported by kernel test build)

Changes in V3:
  - Rebase on 5.12-rc7
  - Moved vas-api.c and VAS Kconfig changes to
arch/powerpc/platform/book3s as Michael Ellerman suggested

Haren Myneni (16):
  powerpc/powernv/vas: Rename register/unregister functions
  powerpc/vas: Make VAS API powerpc platform independent
  powerpc/vas: Create take/drop task reference functions
  powerpc/vas: Move update_csb/dump_crb to common book3s platform
  powerpc/vas:  Define and use common vas_window struct
  powerpc/pseries/vas: Define VAS/NXGZIP HCALLs and structs
  powerpc/vas: Define QoS credit flag to allocate window
  powerpc/pseries/VAS: Implement allo

Re: Bogus struct page layout on 32-bit

2021-04-17 Thread Grygorii Strashko

Hi Ilias, All,

On 10/04/2021 11:52, Ilias Apalodimas wrote:

+CC Grygorii for the cpsw part as Ivan's email is not valid anymore

Thanks for catching this. Interesting indeed...

On Sat, 10 Apr 2021 at 09:22, Jesper Dangaard Brouer  wrote:


On Sat, 10 Apr 2021 03:43:13 +0100
Matthew Wilcox  wrote:


On Sat, Apr 10, 2021 at 06:45:35AM +0800, kernel test robot wrote:

include/linux/mm_types.h:274:1: error: static_assert failed due to requirement 
'__builtin_offsetof(struct page, lru) == __builtin_offsetof(struct folio, lru)' 
"offsetof(struct page, lru) == offsetof(struct folio, lru)"

FOLIO_MATCH(lru, lru);
include/linux/mm_types.h:272:2: note: expanded from macro 'FOLIO_MATCH'
static_assert(offsetof(struct page, pg) == offsetof(struct folio, 
fl))


Well, this is interesting.  pahole reports:

struct page {
 long unsigned int  flags;/* 0 4 */
 /* XXX 4 bytes hole, try to pack */
 union {
 struct {
 struct list_head lru;/* 8 8 */
...
struct folio {
 union {
 struct {
 long unsigned int flags; /* 0 4 */
 struct list_head lru;/* 4 8 */

so this assert has absolutely done its job.

But why has this assert triggered?  Why is struct page layout not what
we thought it was?  Turns out it's the dma_addr added in 2019 by commit
c25fff7171be ("mm: add dma_addr_t to struct page").  On this particular
config, it's 64-bit, and ppc32 requires alignment to 64-bit.  So
the whole union gets moved out by 4 bytes.


Argh, good that you are catching this!


Unfortunately, we can't just fix this by putting an 'unsigned long pad'
in front of it.  It still aligns the entire union to 8 bytes, and then
it skips another 4 bytes after the pad.

We can fix it like this ...

+++ b/include/linux/mm_types.h
@@ -96,11 +96,12 @@ struct page {
 unsigned long private;
 };
 struct {/* page_pool used by netstack */
+   unsigned long _page_pool_pad;


I'm fine with this pad.  Matteo is currently proposing[1] to add a 32-bit
value after @dma_addr, and he could use this area instead.

[1] 
https://lore.kernel.org/netdev/20210409223801.104657-3-mcr...@linux.microsoft.com/

When adding/changing this, we need to make sure that it doesn't overlap
member @index, because network stack use/check page_is_pfmemalloc().
As far as my calculations this is safe to add.  I always try to keep an
eye out for this, but I wonder if we could have a build check like yours.



 /**
  * @dma_addr: might require a 64-bit value even on
  * 32-bit architectures.
  */
-   dma_addr_t dma_addr;
+   dma_addr_t dma_addr __packed;
 };
 struct {/* slab, slob and slub */
 union {

but I don't know if GCC is smart enough to realise that dma_addr is now
on an 8 byte boundary and it can use a normal instruction to access it,
or whether it'll do something daft like use byte loads to access it.

We could also do:

+   dma_addr_t dma_addr __packed __aligned(sizeof(void *));

and I see pahole, at least sees this correctly:

 struct {
 long unsigned int _page_pool_pad; /* 4 4 */
 dma_addr_t dma_addr __attribute__((__aligned__(4))); 
/* 8 8 */
 } __attribute__((__packed__)) __attribute__((__aligned__(4)));

This presumably affects any 32-bit architecture with a 64-bit phys_addr_t
/ dma_addr_t.  Advice, please?


I'm not sure that the 32-bit behavior is with 64-bit (dma) addrs.

I don't have any 32-bit boards with 64-bit DMA.  Cc. Ivan, wasn't your
board (572x ?) 32-bit with driver 'cpsw' this case (where Ivan added
XDP+page_pool) ?


Sry, for delayed reply.

The TI platforms am3/4/5 (cpsw) and Keystone 2 (netcp) can do only 32bit DMA 
even in case of LPAE (dma-ranges are used).
Originally, as I remember, CONFIG_ARCH_DMA_ADDR_T_64BIT has not been selected 
for the LPAE case
on TI platforms and the fact that it became set is the result of 
multi-paltform/allXXXconfig/DMA
optimizations and unification.
(just checked - not set in 4.14)

Probable commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config symbol 
in lib/Kconfig").

The TI drivers have been updated, finally to accept ARCH_DMA_ADDR_T_64BIT=y by 
using things like (__force u32)
for example.

Honestly, I've done sanity check of CPSW with LPAE=y (ARCH_DMA_ADDR_T_64BIT=y) 
very long time ago.

--
Best regards,
grygorii


Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread Matthew Wilcox
On Sat, Apr 17, 2021 at 09:32:06PM +0300, Ilias Apalodimas wrote:
> > +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t 
> > addr)
> > +{
> > +   page->dma_addr[0] = addr;
> > +   if (sizeof(dma_addr_t) > sizeof(unsigned long))
> > +   page->dma_addr[1] = addr >> 16 >> 16;
> 
> The 'error' that was reported will never trigger right?
> I assume this was compiled with dma_addr_t as 32bits (so it triggered the
> compilation error), but the if check will never allow this codepath to run.
> If so can we add a comment explaining this, since none of us will remember why
> in 6 months from now?

That's right.  I compiled it all three ways -- 32-bit, 64-bit dma, 32-bit long
and 64-bit.  The 32/64 bit case turn into:

if (0)
page->dma_addr[1] = addr >> 16 >> 16;

which gets elided.  So the only case that has to work is 64-bit dma and
32-bit long.

I can replace this with upper_32_bits().



PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr

2021-04-17 Thread Randy Dunlap
Hi,

kernel test robot reports:

>> drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of 
>> function 'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration]
   enable_kernel_fp();
   ^

when
# CONFIG_PPC_FPU is not set
CONFIG_ALTIVEC=y

I see at least one other place that does not handle that
combination well, here:

../arch/powerpc/lib/sstep.c: In function 'do_vec_load':
../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 
'put_vr' [-Werror=implicit-function-declaration]
  637 |   put_vr(rn, &u.v);
  |   ^~
../arch/powerpc/lib/sstep.c: In function 'do_vec_store':
../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 
'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration]
  660 |   get_vr(rn, &u.v);
  |   ^~


Should the code + Kconfigs/Makefiles handle that kind of
kernel config or should ALTIVEC always mean PPC_FPU as well?

I have patches to fix the build errors with the config as
reported but I don't know if that's the right thing to do...

thanks.
-- 
~Randy



Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread Ilias Apalodimas
Hi Matthew,

On Sat, Apr 17, 2021 at 03:45:22AM +0100, Matthew Wilcox wrote:
> 
> Replacement patch to fix compiler warning.
> 
> From: "Matthew Wilcox (Oracle)" 
> Date: Fri, 16 Apr 2021 16:34:55 -0400
> Subject: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
> To: bro...@redhat.com
> Cc: linux-ker...@vger.kernel.org,
> linux...@kvack.org,
> net...@vger.kernel.org,
> linuxppc-dev@lists.ozlabs.org,
> linux-arm-ker...@lists.infradead.org,
> linux-m...@vger.kernel.org,
> ilias.apalodi...@linaro.org,
> mcr...@linux.microsoft.com,
> grygorii.stras...@ti.com,
> a...@kernel.org,
> h...@lst.de,
> linux-snps-...@lists.infradead.org,
> mho...@kernel.org,
> mgor...@suse.de
> 
> 32-bit architectures which expect 8-byte alignment for 8-byte integers
> and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
> page inadvertently expanded in 2019.  When the dma_addr_t was added,
> it forced the alignment of the union to 8 bytes, which inserted a 4 byte
> gap between 'flags' and the union.
> 
> Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
> This restores the alignment to that of an unsigned long, and also fixes a
> potential problem where (on a big endian platform), the bit used to denote
> PageTail could inadvertently get set, and a racing get_user_pages_fast()
> could dereference a bogus compound_head().
> 
> Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  include/linux/mm_types.h |  4 ++--
>  include/net/page_pool.h  | 12 +++-
>  net/core/page_pool.c | 12 +++-
>  3 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6613b26a8894..5aacc1c10a45 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -97,10 +97,10 @@ struct page {
>   };
>   struct {/* page_pool used by netstack */
>   /**
> -  * @dma_addr: might require a 64-bit value even on
> +  * @dma_addr: might require a 64-bit value on
>* 32-bit architectures.
>*/
> - dma_addr_t dma_addr;
> + unsigned long dma_addr[2];
>   };
>   struct {/* slab, slob and slub */
>   union {
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index b5b195305346..ad6154dc206c 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
> page_pool *pool,
>  
>  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  {
> - return page->dma_addr;
> + dma_addr_t ret = page->dma_addr[0];
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
> + return ret;
> +}
> +
> +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> +{
> + page->dma_addr[0] = addr;
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + page->dma_addr[1] = addr >> 16 >> 16;


The 'error' that was reported will never trigger right?
I assume this was compiled with dma_addr_t as 32bits (so it triggered the
compilation error), but the if check will never allow this codepath to run.
If so can we add a comment explaining this, since none of us will remember why
in 6 months from now?

>  }
>  
>  static inline bool is_page_pool_compiled_in(void)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ad8b0707af04..f014fd8c19a6 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct 
> page_pool *pool,
> struct page *page,
> unsigned int dma_sync_size)
>  {
> + dma_addr_t dma_addr = page_pool_get_dma_addr(page);
> +
>   dma_sync_size = min(dma_sync_size, pool->p.max_len);
> - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
> + dma_sync_single_range_for_device(pool->p.dev, dma_addr,
>pool->p.offset, dma_sync_size,
>pool->p.dma_dir);
>  }
> @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct 
> page_pool *pool,
>   put_page(page);
>   return NULL;
>   }
> - page->dma_addr = dma;
> + page_pool_set_dma_addr(page, dma);
>  
>   if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>   page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, 
> struct page *page)
>*/
>   goto skip_dma_unmap;
>  
> - dma = page->dma_addr;
> + dma = page_po

Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread Arnd Bergmann
On Sat, Apr 17, 2021 at 3:58 PM Matthew Wilcox  wrote:
> I wouldn't like to make that assumption.  I've come across IOMMUs (maybe
> on parisc?  powerpc?) that like to encode fun information in the top
> few bits.  So we could get it down to 52 bits, but I don't think we can
> get all the way down to 32 bits.  Also, we need to keep the bottom bit
> clear for PageTail, so that further constrains us.

I'd be surprised to find such an IOMMU on a 32-bit machine, given that
the main reason for using an IOMMU on these is to avoid the 32-bit
address limit in DMA masters.

I see that parisc32 does not enable 64-bit dma_addr_t, while powerpc32
does not support any IOMMU, so it wouldn't be either of those two.

I do remember some powerpc systems that encode additional flags
(transaction ordering, caching, ...) into the high bits of the physical
address in the IOTLB, but not the virtual address used for looking
them up.

> Anyway, I like the "two unsigned longs" approach I posted yesterday,
> but thanks for the suggestion.

Ok, fair enough. As long as there are enough bits in this branch of
'struct page', I suppose it is the safe choice.

Arnd


Re: swiotlb cleanups v3

2021-04-17 Thread Tom Lendacky
> Hi Konrad,
>
> this series contains a bunch of swiotlb cleanups, mostly to reduce the
> amount of internals exposed to code outside of swiotlb.c, which should
> helper to prepare for supporting multiple different bounce buffer pools.

Somewhere between the 1st and 2nd patch, specifying a specific swiotlb
for an SEV guest is no longer honored. For example, if I start an SEV
guest with 16GB of memory and specify swiotlb=131072 I used to get a
256MB SWIOTLB. However, after the 2nd patch, the swiotlb=131072 is no
longer honored and I get a 982MB SWIOTLB (as set via sev_setup_arch() in
arch/x86/mm/mem_encrypt.c).

I can't be sure which patch caused the issue since an SEV guest fails to
boot with the 1st patch but can boot with the 2nd patch, at which point
the SWIOTLB comes in at 982MB (I haven't had a chance to debug it and so
I'm hoping you might be able to quickly spot what's going on).

Thanks,
Tom

>
> Changes since v2:
>  - fix a bisetion hazard that did not allocate the alloc_size array
>  - dropped all patches already merged
>
> Changes since v1:
>  - rebased to v5.12-rc1
>  - a few more cleanups
>  - merge and forward port the patch from Claire to move all the global
>variables into a struct to prepare for multiple instances



Re: [PATCH v2] tools: do not include scripts/Kbuild.include

2021-04-17 Thread Masahiro Yamada
On Fri, Apr 16, 2021 at 10:01 PM Masahiro Yamada  wrote:
>
> Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
> scripts/Makefile.compiler"), some kselftests fail to build.
>
> The tools/ directory opted out Kbuild, and went in a different
> direction. They copy any kind of files to the tools/ directory
> in order to do whatever they want in their world.
>
> tools/build/Build.include mimics scripts/Kbuild.include, but some
> tool Makefiles included the Kbuild one to import a feature that is
> missing in tools/build/Build.include:
>
>  - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
>only if supported") included scripts/Kbuild.include from
>tools/thermal/tmon/Makefile to import the cc-option macro.
>
>  - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
>not support -no-pie") included scripts/Kbuild.include from
>tools/testing/selftests/kvm/Makefile to import the try-run macro.
>
>  - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
>failures") included scripts/Kbuild.include from
>tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
>target.
>
>  - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
>unrecognized option") included scripts/Kbuild.include from
>tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
>try-run macro.
>
> Copy what they need into tools/build/Build.include, and make them
> include it instead of scripts/Kbuild.include.
>
> Link: 
> https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/
> Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to 
> scripts/Makefile.compiler")
> Reported-by: Janosch Frank 
> Reported-by: Christian Borntraeger 
> Signed-off-by: Masahiro Yamada 


Applied to linux-kbuild.




> ---
>
> Changes in v2:
>   - copy macros to tools/build/BUild.include
>
>  tools/build/Build.include | 24 +++
>  tools/testing/selftests/bpf/Makefile  |  2 +-
>  tools/testing/selftests/kvm/Makefile  |  2 +-
>  .../selftests/powerpc/pmu/ebb/Makefile|  2 +-
>  tools/thermal/tmon/Makefile   |  2 +-
>  5 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/tools/build/Build.include b/tools/build/Build.include
> index 585486e40995..2cf3b1bde86e 100644
> --- a/tools/build/Build.include
> +++ b/tools/build/Build.include
> @@ -100,3 +100,27 @@ cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) 
> -D"BUILD_STR(s)=\#s" $(CXX
>  ## HOSTCC C flags
>
>  host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(KBUILD_HOSTCFLAGS) 
> -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj))
> +
> +# output directory for tests below
> +TMPOUT = .tmp_
> +
> +# try-run
> +# Usage: option = $(call try-run, $(CC)...-o "$$TMP",option-ok,otherwise)
> +# Exit code chooses option. "$$TMP" serves as a temporary file and is
> +# automatically cleaned up.
> +try-run = $(shell set -e;  \
> +   TMP=$(TMPOUT)/tmp;  \
> +   mkdir -p $(TMPOUT); \
> +   trap "rm -rf $(TMPOUT)" EXIT;   \
> +   if ($(1)) >/dev/null 2>&1;  \
> +   then echo "$(2)";   \
> +   else echo "$(3)";   \
> +   fi)
> +
> +# cc-option
> +# Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
> +cc-option = $(call try-run, \
> +   $(CC) -Werror $(1) -c -x c /dev/null -o "$$TMP",$(1),$(2))
> +
> +# delete partially updated (i.e. corrupted) files on error
> +.DELETE_ON_ERROR:
> diff --git a/tools/testing/selftests/bpf/Makefile 
> b/tools/testing/selftests/bpf/Makefile
> index 044bfdcf5b74..17a5cdf48d37 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -include ../../../../scripts/Kbuild.include
> +include ../../../build/Build.include
>  include ../../../scripts/Makefile.arch
>  include ../../../scripts/Makefile.include
>
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index a6d61f451f88..5ef141f265bd 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -include ../../../../scripts/Kbuild.include
> +include ../../../build/Build.include
>
>  all:
>
> diff --git a/tools/testing/selftests/powerpc/pmu/ebb/Makefile 
> b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
> index af3df79d8163..c5ecb4634094 100644
> --- a/tools/testing/selftests/powerpc/pmu/ebb/Makefile
> +++ b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -include ../../../../../../scripts/Kbuild.include
> +include ../../../../../build/Build.include
>
>  noarg:
> $(MAKE) -C ../../
> diff --git a/tools/thermal/tmon/Makefile b/tools/thermal/tmon/Makefile
> index 59e417ec3e13..9db867df7679 100644
> ---

Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread Matthew Wilcox
On Sat, Apr 17, 2021 at 12:31:37PM +0200, Arnd Bergmann wrote:
> On Fri, Apr 16, 2021 at 5:27 PM Matthew Wilcox  wrote:
> > diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> > index b5b195305346..db7c7020746a 100644
> > --- a/include/net/page_pool.h
> > +++ b/include/net/page_pool.h
> > @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
> > page_pool *pool,
> >
> >  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
> >  {
> > -   return page->dma_addr;
> > +   dma_addr_t ret = page->dma_addr[0];
> > +   if (sizeof(dma_addr_t) > sizeof(unsigned long))
> > +   ret |= (dma_addr_t)page->dma_addr[1] << 32;
> > +   return ret;
> > +}
> 
> Have you considered using a PFN type address here? I suspect you
> can prove that shifting the DMA address by PAGE_BITS would
> make it fit into an 'unsigned long' on all 32-bit architectures with
> 64-bit dma_addr_t. This requires that page->dma_addr to be
> page aligned, as well as fit into 44 bits. I recently went through the
> maximum address space per architecture to define a
> MAX_POSSIBLE_PHYSMEM_BITS, and none of them have more than
> 40 here, presumably the same is true for dma address space.

I wouldn't like to make that assumption.  I've come across IOMMUs (maybe
on parisc?  powerpc?) that like to encode fun information in the top
few bits.  So we could get it down to 52 bits, but I don't think we can
get all the way down to 32 bits.  Also, we need to keep the bottom bit
clear for PageTail, so that further constrains us.

Anyway, I like the "two unsigned longs" approach I posted yesterday,
but thanks for the suggestion.


Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode

2021-04-17 Thread Christophe Leroy




Le 16/04/2021 à 01:49, Alexei Starovoitov a écrit :

On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet  wrote:


2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann 

On 4/15/21 11:32 AM, Jianlin Lv wrote:

For debugging JITs, dumping the JITed image to kernel log is discouraged,
"bpftool prog dump jited" is much better way to examine JITed dumps.
This patch get rid of the code related to bpf_jit_enable=2 mode and
update the proc handler of bpf_jit_enable, also added auxiliary
information to explain how to use bpf_jit_disasm tool after this change.

Signed-off-by: Jianlin Lv 


Hello,

For what it's worth, I have already seen people dump the JIT image in
kernel logs in Qemu VMs running with just a busybox, not for kernel
development, but in a context where buiding/using bpftool was not
possible.


If building/using bpftool is not possible then majority of selftests won't
be exercised. I don't think such environment is suitable for any kind
of bpf development. Much so for JIT debugging.
While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers.
I'd rather nuke that code instead of carrying it from kernel to kernel.



When I implemented JIT for PPC32, it was extremely helpfull.

As far as I understand, for the time being bpftool is not usable in my environment because it 
doesn't support cross compilation when the target's endianess differs from the building host 
endianess, see discussion at 
https://lore.kernel.org/bpf/21e66a09-514f-f426-b9e2-13baab0b9...@csgroup.eu/


That's right that selftests can't be exercised because they don't build.

The question might be candid as I didn't investigate much about the replacement of "bpf_jit_enable=2 
debugging mode" by bpftool, how do we use bpftool exactly for that ? Especially when using the BPF 
test module ?




RE: Bogus struct page layout on 32-bit

2021-04-17 Thread David Laight
From: Grygorii Strashko
> Sent: 16 April 2021 10:27
...
> Sry, for delayed reply.
> 
> The TI platforms am3/4/5 (cpsw) and Keystone 2 (netcp) can do only 32bit DMA 
> even in case of LPAE
> (dma-ranges are used).
> Originally, as I remember, CONFIG_ARCH_DMA_ADDR_T_64BIT has not been selected 
> for the LPAE case
> on TI platforms and the fact that it became set is the result of 
> multi-paltform/allXXXconfig/DMA
> optimizations and unification.
> (just checked - not set in 4.14)
> 
> Probable commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config 
> symbol in lib/Kconfig").
> 
> The TI drivers have been updated, finally to accept ARCH_DMA_ADDR_T_64BIT=y 
> by using things like
> (__force u32)
> for example.

Hmmm using (__force u32) is probably wrong.
If an address +length >= 2**32 can get passed then the IO request
needs to be errored (or a bounce buffer used).

Otherwise you can get particularly horrid corruptions.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH] powerpc/pseries/mce: Fix a typo in error type assignment

2021-04-17 Thread Michael Ellerman
Ganesh Goudar  writes:
> The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE.

Do you mean "is ICACHE not DCACHE" ?

cheers

> Signed-off-by: Ganesh Goudar 
> ---
>  arch/powerpc/platforms/pseries/ras.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c 
> b/arch/powerpc/platforms/pseries/ras.c
> index f8b390a9d9fb..9d4ef65da7f3 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -699,7 +699,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs,
>   mce_err.error_type = MCE_ERROR_TYPE_DCACHE;
>   break;
>   case MC_ERROR_TYPE_I_CACHE:
> - mce_err.error_type = MCE_ERROR_TYPE_DCACHE;
> + mce_err.error_type = MCE_ERROR_TYPE_ICACHE;
>   break;
>   case MC_ERROR_TYPE_UNKNOWN:
>   default:
> -- 
> 2.26.2


Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

2021-04-17 Thread Michael Ellerman
Tyrel Datwyler  writes:
> On 4/1/21 5:13 PM, Tyrel Datwyler wrote:
>> Currently, neither the vio_bus or vio_driver structures provide support
>> for a shutdown() routine.
>> 
>> Add support for shutdown() by allowing drivers to provide a
>> implementation via function pointer in their vio_driver struct and
>> provide a proper implementation in the driver template for the vio_bus
>> that calls a vio drivers shutdown() if defined.
>> 
>> In the case that no shutdown() is defined by a vio driver and a kexec is
>> in progress we implement a big hammer that calls remove() to ensure no
>> further DMA for the devices is possible.
>> 
>> Signed-off-by: Tyrel Datwyler 
>> ---
>
> Ping... any comments, problems with this approach?

The kexec part seems like a bit of a hack.

It also doesn't help for kdump, when none of the shutdown code is run.

How many drivers do we have? Can we just implement a proper shutdown for
them?

cheers


RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread David Laight
From: Matthew Wilcox
> Sent: 16 April 2021 16:28
> 
> On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote:
> > See below patch.  Where I swap32 the dma address to satisfy
> > page->compound having bit zero cleared. (It is the simplest fix I could
> > come up with).
> 
> I think this is slightly simpler, and as a bonus code that assumes the
> old layout won't compile.

Always a good plan.

...
>  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  {
> - return page->dma_addr;
> + dma_addr_t ret = page->dma_addr[0];
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + ret |= (dma_addr_t)page->dma_addr[1] << 32;
> + return ret;
> +}

Won't some compiler/option combinations generate an
error for the '<< 32' when dma_addr_t is 32bit?

You might need to use a (u64) cast.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread Arnd Bergmann
On Fri, Apr 16, 2021 at 5:27 PM Matthew Wilcox  wrote:
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index b5b195305346..db7c7020746a 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
> page_pool *pool,
>
>  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  {
> -   return page->dma_addr;
> +   dma_addr_t ret = page->dma_addr[0];
> +   if (sizeof(dma_addr_t) > sizeof(unsigned long))
> +   ret |= (dma_addr_t)page->dma_addr[1] << 32;
> +   return ret;
> +}

Have you considered using a PFN type address here? I suspect you
can prove that shifting the DMA address by PAGE_BITS would
make it fit into an 'unsigned long' on all 32-bit architectures with
64-bit dma_addr_t. This requires that page->dma_addr to be
page aligned, as well as fit into 44 bits. I recently went through the
maximum address space per architecture to define a
MAX_POSSIBLE_PHYSMEM_BITS, and none of them have more than
40 here, presumably the same is true for dma address space.

Arnd


[PATCH] perf vendor events: Initial json/events list for power10 platform

2021-04-17 Thread Kajol Jain
Patch adds initial json/events for POWER10.

Signed-off-by: Kajol Jain 
---
 .../perf/pmu-events/arch/powerpc/mapfile.csv  |   1 +
 .../arch/powerpc/power10/cache.json   |  47 +++
 .../arch/powerpc/power10/floating_point.json  |   7 +
 .../arch/powerpc/power10/frontend.json| 217 +
 .../arch/powerpc/power10/locks.json   |  12 +
 .../arch/powerpc/power10/marked.json  | 147 +
 .../arch/powerpc/power10/memory.json  | 192 +++
 .../arch/powerpc/power10/others.json  | 297 ++
 .../arch/powerpc/power10/pipeline.json| 297 ++
 .../pmu-events/arch/powerpc/power10/pmc.json  |  22 ++
 .../arch/powerpc/power10/translation.json |  57 
 11 files changed, 1296 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/cache.json
 create mode 100644 
tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/frontend.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/locks.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/marked.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/memory.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/others.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pipeline.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pmc.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/translation.json

diff --git a/tools/perf/pmu-events/arch/powerpc/mapfile.csv 
b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
index 229150e7ab7d..4abdfc3f9692 100644
--- a/tools/perf/pmu-events/arch/powerpc/mapfile.csv
+++ b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
@@ -15,3 +15,4 @@
 # Power8 entries
 004[bcd][[:xdigit:]]{4},1,power8,core
 004e[[:xdigit:]]{4},1,power9,core
+0080[[:xdigit:]]{4},1,power10,core
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/cache.json 
b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
new file mode 100644
index ..95e33531fbc6
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
@@ -0,0 +1,47 @@
+[
+  {
+"EventCode": "1003C",
+"EventName": "PM_EXEC_STALL_DMISS_L2L3",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was waiting for a load miss to resolve from either the local L2 or 
local L3."
+  },
+  {
+"EventCode": "34056",
+"EventName": "PM_EXEC_STALL_LOAD_FINISH",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was finishing a load after its data was reloaded from a data source 
beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles 
in which the NTF instruction merged with another load in the LMQ."
+  },
+  {
+"EventCode": "3006C",
+"EventName": "PM_RUN_CYC_SMT2_MODE",
+"BriefDescription": "Cycles when this thread's run latch is set and the 
core is in SMT2 mode"
+  },
+  {
+"EventCode": "300F4",
+"EventName": "PM_RUN_INST_CMPL_CONC",
+"BriefDescription": "PowerPC instructions completed by this thread when 
all threads in the core had the run-latch set"
+  },
+  {
+"EventCode": "4C016",
+"EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was waiting for a load miss to resolve from the local L2 or local L3, 
with a dispatch conflict."
+  },
+  {
+"EventCode": "4D014",
+"EventName": "PM_EXEC_STALL_LOAD",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was a load instruction executing in the Load Store Unit."
+  },
+  {
+"EventCode": "4D016",
+"EventName": "PM_EXEC_STALL_PTESYNC",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was a PTESYNC instruction executing in the Load Store Unit."
+  },
+  {
+"EventCode": "401EA",
+"EventName": "PM_THRESH_EXC_128",
+"BriefDescription": "Threshold counter exceeded a value of 128"
+  },
+  {
+"EventCode": "400F6",
+"EventName": "PM_BR_MPRED_CMPL",
+"BriefDescription": "A mispredicted branch completed. Includes direction 
and target."
+  }
+]
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json 
b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
new file mode 100644
index ..e9b92f282d3c
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
@@ -0,0 +1,7 @@
+[
+  {
+"EventCode": "4016E",
+"EventName": "PM_THRESH_NOT_MET",
+"BriefDescription": "Threshold counter did not meet threshold"
+  }
+]
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json 
b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
new file mode 100644
index ..aebaf94bfdfe
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
@@ -0,0 +1,217 @@
+[
+

Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread Jesper Dangaard Brouer
On Sat, 17 Apr 2021 00:07:23 +0100
"Matthew Wilcox (Oracle)"  wrote:

> 32-bit architectures which expect 8-byte alignment for 8-byte integers
> and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
> page inadvertently expanded in 2019.  When the dma_addr_t was added,
> it forced the alignment of the union to 8 bytes, which inserted a 4 byte
> gap between 'flags' and the union.
> 
> Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
> This restores the alignment to that of an unsigned long, and also fixes a
> potential problem where (on a big endian platform), the bit used to denote
> PageTail could inadvertently get set, and a racing get_user_pages_fast()
> could dereference a bogus compound_head().
> 
> Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---

Acked-by: Jesper Dangaard Brouer 

Thanks you Matthew for working on a fix for this.  It's been a pleasure
working with you and exchanging crazy ideas with you for solving this.
Most of them didn't work out, especially those that came to me during
restless nights ;-).

Having worked through the other solutions, some very intrusive and some
could even be consider ugly.  I think we have a good and non-intrusive
solution/workaround in this patch.  Thanks!


>  include/linux/mm_types.h |  4 ++--
>  include/net/page_pool.h  | 12 +++-
>  net/core/page_pool.c | 12 +++-
>  3 files changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6613b26a8894..5aacc1c10a45 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -97,10 +97,10 @@ struct page {
>   };
>   struct {/* page_pool used by netstack */
>   /**
> -  * @dma_addr: might require a 64-bit value even on
> +  * @dma_addr: might require a 64-bit value on
>* 32-bit architectures.
>*/
> - dma_addr_t dma_addr;
> + unsigned long dma_addr[2];
>   };
>   struct {/* slab, slob and slub */
>   union {
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index b5b195305346..db7c7020746a 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
> page_pool *pool,
>  
>  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  {
> - return page->dma_addr;
> + dma_addr_t ret = page->dma_addr[0];
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + ret |= (dma_addr_t)page->dma_addr[1] << 32;
> + return ret;
> +}
> +
> +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> +{
> + page->dma_addr[0] = addr;
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + page->dma_addr[1] = addr >> 32;
>  }
>  
>  static inline bool is_page_pool_compiled_in(void)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ad8b0707af04..f014fd8c19a6 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct 
> page_pool *pool,
> struct page *page,
> unsigned int dma_sync_size)
>  {
> + dma_addr_t dma_addr = page_pool_get_dma_addr(page);
> +
>   dma_sync_size = min(dma_sync_size, pool->p.max_len);
> - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
> + dma_sync_single_range_for_device(pool->p.dev, dma_addr,
>pool->p.offset, dma_sync_size,
>pool->p.dma_dir);
>  }
> @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct 
> page_pool *pool,
>   put_page(page);
>   return NULL;
>   }
> - page->dma_addr = dma;
> + page_pool_set_dma_addr(page, dma);
>  
>   if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>   page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, 
> struct page *page)
>*/
>   goto skip_dma_unmap;
>  
> - dma = page->dma_addr;
> + dma = page_pool_get_dma_addr(page);
>  
> - /* When page is unmapped, it cannot be returned our pool */
> + /* When page is unmapped, it cannot be returned to our pool */
>   dma_unmap_page_attrs(pool->p.dev, dma,
>PAGE_SIZE << pool->p.order, pool->p.dma_dir,
>DMA_ATTR_SKIP_CPU_SYNC);
> - page->dma_addr = 0;
> + page_pool_set_dma_addr(page, 0);
>  skip_dma_unmap:
>   /* This may be the last page returned, releasing the pool, so
>* it is not safe to re