Re: [PATCH v3 6/6] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling

2023-04-13 Thread Ira Weiny
Jonathan Cameron wrote:
> On Wed, 12 Apr 2023 16:29:01 -0500
> Bjorn Helgaas  wrote:
> 
> > On Tue, Apr 11, 2023 at 01:03:02PM -0500, Terry Bowman wrote:
> > > From: Robert Richter 
> > > 
> > > RCEC AER corrected and uncorrectable internal errors (CIE/UIE) are
> > > disabled by default.  
> > 
> > "Disabled by default" just means "the power-up state of CIE/UIC is
> > that they are masked", right?  It doesn't mean that Linux normally
> > masks them.
> > 
> > > [1][2] Enable them to receive CXL downstream port
> > > errors of a Restricted CXL Host (RCH).
> > > 
> > > [1] CXL 3.0 Spec, 12.2.1.1 - RCH Downstream Port Detected Errors
> > > [2] PCIe Base Spec 6.0, 7.8.4.3 Uncorrectable Error Mask Register,
> > > 7.8.4.6 Correctable Error Mask Register
> > > 
> > > Co-developed-by: Terry Bowman 
> > > Signed-off-by: Robert Richter 
> > > Signed-off-by: Terry Bowman 
> > > Cc: "Oliver O'Halloran" 
> > > Cc: Bjorn Helgaas 
> > > Cc: Mahesh J Salgaonkar 
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Cc: linux-...@vger.kernel.org
> > > ---
> > >  drivers/pci/pcie/aer.c | 73 ++
> > >  1 file changed, 73 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 171a08fd8ebd..3973c731e11d 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1000,7 +1000,79 @@ static void cxl_handle_error(struct pci_dev *dev, 
> > > struct aer_err_info *info)
> > >   pcie_walk_rcec(dev, cxl_handle_error_iter, info);
> > >  }
> > >  
> > > +static bool cxl_error_is_native(struct pci_dev *dev)
> > > +{
> > > + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> > > +
> > > + if (pcie_ports_native)
> > > + return true;
> > > +
> > > + return host->native_aer && host->native_cxl_error;
> > > +}
> > > +
> > > +static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> > > +{
> > > + int *handles_cxl = data;
> > > +
> > > + *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
> > > +
> > > + return *handles_cxl;
> > > +}
> > > +
> > > +static bool handles_cxl_errors(struct pci_dev *rcec)
> > > +{
> > > + int handles_cxl = 0;
> > > +
> > > + if (!rcec->aer_cap)
> > > + return false;
> > > +
> > > + if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC)
> > > + pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
> > > +
> > > + return !!handles_cxl;
> > > +}
> > > +
> > > +static int __cxl_unmask_internal_errors(struct pci_dev *rcec)
> > > +{
> > > + int aer, rc;
> > > + u32 mask;
> > > +
> > > + /*
> > > +  * Internal errors are masked by default, unmask RCEC's here
> > > +  * PCI6.0 7.8.4.3 Uncorrectable Error Mask Register (Offset 08h)
> > > +  * PCI6.0 7.8.4.6 Correctable Error Mask Register (Offset 14h)
> > > +  */  
> > 
> > Unmasking internal errors doesn't have anything specific to do with
> > CXL, so I don't think it should have "cxl" in the function name.
> > Maybe something like "pci_aer_unmask_internal_errors()".
> 
> This reminds me.  Not sure we resolved earlier discussion on changing
> the system wide policy to turn these on 
> https://lore.kernel.org/linux-cxl/20221229172731.GA611562@bhelgaas/
> which needs pretty much the same thing.
> 
> Ira, I think you were picking this one up?
> https://lore.kernel.org/linux-cxl/63e5fb533f304_13244829412@iweiny-mobl.notmuch/

After this discussion I posted an RFC to enable those errors.

https://lore.kernel.org/all/20230209-cxl-pci-aer-v1-1-f9a817fa4...@intel.com/

Unfortunately the prevailing opinion was that this was unsafe.  And no one
piped up with a reason to pursue the alternative of a pci core call to enable
them as needed.

So I abandoned the work.

I think the direction things where headed was to have a call like:

int pci_enable_pci_internal_errors(struct pci_dev *dev)
{
int pos_cap_err;
u32 reg;

if (!pcie_aer_is_native(dev))
return -EIO;

pos_cap_err = dev->aer_cap;

/* Unmask correctable and uncorrectable (non-fatal) internal errors */
pci_read_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, ®);
reg &= ~PCI_ERR_COR_INTERNAL;
pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, reg);

pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, ®);
reg &= ~PCI_ERR_UNC_INTN;
pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, reg);

pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, ®);
reg &= ~PCI_ERR_UNC_INTN;
pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, reg);

return 0;
}

... and call this from the cxl code where it is needed.

Is this an acceptable direction?  Terry is welcome to steal the above from my
patch and throw it into the PCI core.

Looking at the current state of things I think cxl_pci_ras_unmask() may
actually be broken now without calling something like the above.  For that I
dropped 

Re: [PATCH 0/3] COVER: Remove memcpy_page_flushcache()

2023-03-16 Thread Ira Weiny
+ Konstantin

Michael Ellerman wrote:
> Ira Weiny  writes:
> > Dave Hansen wrote:
> >> On 3/15/23 16:20, Ira Weiny wrote:
> >> > Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray 
> >> > callbacks") removed the calls to memcpy_page_flushcache().
> >> > 
> >> > kmap_atomic() is deprecated and used in the x86 version of
> >> > memcpy_page_flushcache().
> >> > 
> >> > Remove the unnecessary memcpy_page_flushcache() call from all arch's.
> >> 
> >> Hi Ira,
> >> 
> >> Since the common code user is already gone these three patches seem
> >> quite independent.  It seems like the right thing to do is have
> >> individual arch maintainers cherry pick their arch patch and carry it
> >> independently.
> >
> > Yes.
> >
> >> 
> >> Is there a compelling reason to have someone pick up and carry these all
> >> together that I'm missing?
> >
> > No reason.  Would you like me to submit them individually?
> 
> I'll just grab the powerpc one from the thread, no need to resend.

Thanks.

> 
> > Sorry, submitting them separately crossed my mind when I wrote them but I
> > kind of forgot as they were all on the same branch and I was waiting for
> > after the merge window to submit them.
> 
> It's also much easier to run git-send-email HEAD^^^, rather than running
> it three separate times, let alone if it's a 20 patch series.

Exactly.  And I'm using b4 which would have forced me to create a separate
branch for each of the patches to track.  So I was keeping them around in
a single branch to let 0day run after the merge window.  Then I forgot
about the idea of splitting them because b4 had it all packaged up nice!

> 
> I wonder if we could come up with some convention to indicate that a
> series is made up of independent patches, and maintainers are free to
> pick them individually - but still sent as a single series.

Maybe.  But perhaps b4 could have a send option which would split them
out?  I'll see about adding an option to b4 but I've Cc'ed Konstantin as
well for the idea.

Thanks for picking this up!
Ira

> 
> cheers


Re: [PATCH 0/3] COVER: Remove memcpy_page_flushcache()

2023-03-15 Thread Ira Weiny
Dave Hansen wrote:
> On 3/15/23 16:20, Ira Weiny wrote:
> > Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray 
> > callbacks") removed the calls to memcpy_page_flushcache().
> > 
> > kmap_atomic() is deprecated and used in the x86 version of
> > memcpy_page_flushcache().
> > 
> > Remove the unnecessary memcpy_page_flushcache() call from all arch's.
> 
> Hi Ira,
> 
> Since the common code user is already gone these three patches seem
> quite independent.  It seems like the right thing to do is have
> individual arch maintainers cherry pick their arch patch and carry it
> independently.

Yes.

> 
> Is there a compelling reason to have someone pick up and carry these all
> together that I'm missing?

No reason.  Would you like me to submit them individually?

Sorry, submitting them separately crossed my mind when I wrote them but I
kind of forgot as they were all on the same branch and I was waiting for
after the merge window to submit them.

Ira


[PATCH 3/3] arm: uaccess: Remove memcpy_page_flushcache()

2023-03-15 Thread Ira Weiny
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray
callbacks") removed the calls to memcpy_page_flushcache().

Remove the unnecessary memcpy_page_flushcache() call.

Cc: Al Viro 
Cc: "Dan Williams" 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Ira Weiny 
---
 arch/arm64/include/asm/uaccess.h| 2 --
 arch/arm64/lib/uaccess_flushcache.c | 6 --
 2 files changed, 8 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 5c7b2f9d5913..4bf2c0975a82 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -449,8 +449,6 @@ extern long strncpy_from_user(char *dest, const char __user 
*src, long count);
 extern __must_check long strnlen_user(const char __user *str, long n);
 
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
-struct page;
-void memcpy_page_flushcache(char *to, struct page *page, size_t offset, size_t 
len);
 extern unsigned long __must_check __copy_user_flushcache(void *to, const void 
__user *from, unsigned long n);
 
 static inline int __copy_from_user_flushcache(void *dst, const void __user 
*src, unsigned size)
diff --git a/arch/arm64/lib/uaccess_flushcache.c 
b/arch/arm64/lib/uaccess_flushcache.c
index baee22961bdb..7510d1a23124 100644
--- a/arch/arm64/lib/uaccess_flushcache.c
+++ b/arch/arm64/lib/uaccess_flushcache.c
@@ -19,12 +19,6 @@ void memcpy_flushcache(void *dst, const void *src, size_t 
cnt)
 }
 EXPORT_SYMBOL_GPL(memcpy_flushcache);
 
-void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
-   size_t len)
-{
-   memcpy_flushcache(to, page_address(page) + offset, len);
-}
-
 unsigned long __copy_user_flushcache(void *to, const void __user *from,
 unsigned long n)
 {

-- 
2.39.2



[PATCH 2/3] powerpc: Remove memcpy_page_flushcache()

2023-03-15 Thread Ira Weiny
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray
callbacks") removed the calls to memcpy_page_flushcache().

Remove the unnecessary memcpy_page_flushcache() call.

Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Al Viro 
Cc: "Dan Williams" 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ira Weiny 
---
 arch/powerpc/include/asm/uaccess.h | 2 --
 arch/powerpc/lib/pmem.c| 7 ---
 2 files changed, 9 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 3ddc65c63a49..52378e641d38 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -361,8 +361,6 @@ copy_mc_to_user(void __user *to, const void *from, unsigned 
long n)
 
 extern long __copy_from_user_flushcache(void *dst, const void __user *src,
unsigned size);
-extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
-  size_t len);
 
 static __must_check inline bool user_access_begin(const void __user *ptr, 
size_t len)
 {
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index eb2919ddf9b9..4e724c4c01ad 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -85,10 +85,3 @@ void memcpy_flushcache(void *dest, const void *src, size_t 
size)
clean_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL(memcpy_flushcache);
-
-void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
-   size_t len)
-{
-   memcpy_flushcache(to, page_to_virt(page) + offset, len);
-}
-EXPORT_SYMBOL(memcpy_page_flushcache);

-- 
2.39.2



[PATCH 1/3] x86, uaccess: Remove memcpy_page_flushcache()

2023-03-15 Thread Ira Weiny
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray
callbacks") removed the calls to memcpy_page_flushcache().

kmap_atomic() is deprecated and used in memcpy_page_flushcache().

Remove the unnecessary memcpy_page_flushcache() call.

Cc: Al Viro 
Cc: "Dan Williams" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Signed-off-by: Ira Weiny 
---
 arch/x86/include/asm/uaccess_64.h | 2 --
 arch/x86/lib/usercopy_64.c| 9 -
 2 files changed, 11 deletions(-)

diff --git a/arch/x86/include/asm/uaccess_64.h 
b/arch/x86/include/asm/uaccess_64.h
index d13d71af5cf6..c6b1dcded364 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -62,8 +62,6 @@ extern long __copy_user_nocache(void *dst, const void __user 
*src,
unsigned size, int zerorest);
 
 extern long __copy_user_flushcache(void *dst, const void __user *src, unsigned 
size);
-extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
-  size_t len);
 
 static inline int
 __copy_from_user_inatomic_nocache(void *dst, const void __user *src,
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 6c1f8ac5e721..f515542f017f 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -136,13 +136,4 @@ void __memcpy_flushcache(void *_dst, const void *_src, 
size_t size)
}
 }
 EXPORT_SYMBOL_GPL(__memcpy_flushcache);
-
-void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
-   size_t len)
-{
-   char *from = kmap_atomic(page);
-
-   memcpy_flushcache(to, from + offset, len);
-   kunmap_atomic(from);
-}
 #endif

-- 
2.39.2



[PATCH 0/3] COVER: Remove memcpy_page_flushcache()

2023-03-15 Thread Ira Weiny
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray 
callbacks") removed the calls to memcpy_page_flushcache().

kmap_atomic() is deprecated and used in the x86 version of
memcpy_page_flushcache().

Remove the unnecessary memcpy_page_flushcache() call from all arch's.

Signed-off-by: Ira Weiny 
---
Ira Weiny (3):
  x86, uaccess: Remove memcpy_page_flushcache()
  powerpc: Remove memcpy_page_flushcache()
  arm: uaccess: Remove memcpy_page_flushcache()

 arch/arm64/include/asm/uaccess.h| 2 --
 arch/arm64/lib/uaccess_flushcache.c | 6 --
 arch/powerpc/include/asm/uaccess.h  | 2 --
 arch/powerpc/lib/pmem.c | 7 ---
 arch/x86/include/asm/uaccess_64.h   | 2 --
 arch/x86/lib/usercopy_64.c  | 9 -
 6 files changed, 28 deletions(-)
---
base-commit: 6015b1aca1a233379625385feb01dd014aca60b5
change-id: 20221230-kmap-x86-bfda7e1f07ee

Best regards,
-- 
Ira Weiny 



Re: [PATCH RFC] PCI/AER: Enable internal AER errors by default

2023-02-14 Thread Ira Weiny
Bjorn Helgaas wrote:
> On Fri, Feb 10, 2023 at 02:33:23PM -0800, Ira Weiny wrote:
> > The CXL driver expects internal error reporting to be enabled via
> > pci_enable_pcie_error_reporting().  It is likely other drivers expect the 
> > same.
> > Dave submitted a patch to enable the CXL side[1] but the PCI AER registers
> > still mask errors.
> > 
> > PCIe v6.0 Uncorrectable Mask Register (7.8.4.3) and Correctable Mask
> > Register (7.8.4.6) default to masking internal errors.  The
> > Uncorrectable Error Severity Register (7.8.4.4) defaults internal errors
> > as fatal.
> > 
> > Enable internal errors to be reported via the standard
> > pci_enable_pcie_error_reporting() call.  Ensure uncorrectable errors are set
> > non-fatal to limit any impact to other drivers.
> 
> Do you have any background on why the spec makes these errors masked
> by default?  I'm sympathetic to wanting to learn about all the errors
> we can, but I'm a little wary if the spec authors thought it was
> important to mask these by default.
> 

I don't have any idea of the history.

To me 'internal errors' is a pretty wide net and was likely a catch all
that the authors felt was mostly unneeded.

CXL is different because it further divides the errors.

I've enlisted some help internal to Intel to hopefully find some answers.
But in the event no one knows it would be safe to to with my alternate
suggestion and add a new PCIe call to enable this specifically for the
drivers who need it.

Ira


[PATCH RFC] PCI/AER: Enable internal AER errors by default

2023-02-10 Thread Ira Weiny
The CXL driver expects internal error reporting to be enabled via
pci_enable_pcie_error_reporting().  It is likely other drivers expect the same.
Dave submitted a patch to enable the CXL side[1] but the PCI AER registers
still mask errors.

PCIe v6.0 Uncorrectable Mask Register (7.8.4.3) and Correctable Mask
Register (7.8.4.6) default to masking internal errors.  The
Uncorrectable Error Severity Register (7.8.4.4) defaults internal errors
as fatal.

Enable internal errors to be reported via the standard
pci_enable_pcie_error_reporting() call.  Ensure uncorrectable errors are set
non-fatal to limit any impact to other drivers.

[1] 
https://lore.kernel.org/all/167604864163.2392965.5102660329807283871.stgit@djiang5-mobl3.local/

Cc: Bjorn Helgaas 
Cc: Jonathan Cameron 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Stefan Roese 
Cc: "Kuppuswamy Sathyanarayanan" 
Cc: Mahesh J Salgaonkar 
Cc: Oliver O'Halloran 
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ira Weiny 
---
This is RFC to see if it is acceptable to be part of the standard
pci_enable_pcie_error_reporting() call or perhaps a separate pci core
call should be introduced.  It is anticipated that enabling this error
reporting is what existing drivers are expecting.  The errors are marked
non-fatal therefore it should not adversely affect existing devices.
---
 drivers/pci/pcie/aer.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 625f7b2cafe4..9d3ed3a5fc23 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -229,11 +229,28 @@ int pcie_aer_is_native(struct pci_dev *dev)
 
 int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
+   int pos_cap_err;
+   u32 reg;
int rc;
 
if (!pcie_aer_is_native(dev))
return -EIO;
 
+   pos_cap_err = dev->aer_cap;
+
+   /* Unmask correctable and uncorrectable (non-fatal) internal errors */
+   pci_read_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, ®);
+   reg &= ~PCI_ERR_COR_INTERNAL;
+   pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, reg);
+
+   pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, ®);
+   reg &= ~PCI_ERR_UNC_INTN;
+   pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, reg);
+
+   pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, ®);
+   reg &= ~PCI_ERR_UNC_INTN;
+   pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, reg);
+
rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
return pcibios_err_to_errno(rc);
 }

---
base-commit: e5ab7f206ffc873160bd0f1a52cae17ab692a9d1
change-id: 20230209-cxl-pci-aer-18dda61c8239

Best regards,
-- 
Ira Weiny 



[PATCH] checkpatch: Add kmap and kmap_atomic to the deprecated list

2022-08-13 Thread ira . weiny
From: Ira Weiny 

kmap() and kmap_atomic() are being deprecated in favor of
kmap_local_page().

There are two main problems with kmap(): (1) It comes with an overhead
as mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when
the kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

kmap_local_page() is safe from any context and is therefore redundant
with kmap_atomic() with the exception of any pagefault or preemption
disable requirements.  However, using kmap_atomic() for these side
effects makes the code less clear.  So any requirement for pagefault or
preemption disable should be made explicitly.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again,
the kernel virtual addresses are restored.

Suggested-by: Thomas Gleixner 
Suggested-by: Fabio M. De Francesco 
Signed-off-by: Ira Weiny 

---
Suggested by credits.
Thomas: Idea to keep from growing more kmap/kmap_atomic calls.
Fabio: Stole some of his boiler plate commit message.

Notes on tree-wide conversions:

I've cc'ed mailing lists for subsystems which currently contains either kmap()
or kmap_atomic() calls.  As some of you already know Fabio and I have been
working through converting kmap() calls to kmap_local_page().  But there is a
lot more work to be done.  Help from the community is always welcome,
especially with kmap_atomic() conversions.  To keep from stepping on each
others toes I've created a spreadsheet of the current calls[1].  Please let me
or Fabio know if you plan on tacking one of the conversions so we can mark it
off the list.

[1] 
https://docs.google.com/spreadsheets/d/1i_ckZ10p90bH_CkxD2bYNi05S2Qz84E2OFPv8zq__0w/edit#gid=1679714357

---
 scripts/checkpatch.pl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 79e759aac543..9ff219e0a9d5 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -807,6 +807,8 @@ our %deprecated_apis = (
"rcu_barrier_sched" => "rcu_barrier",
"get_state_synchronize_sched"   => "get_state_synchronize_rcu",
"cond_synchronize_sched"=> "cond_synchronize_rcu",
+   "kmap"  => "kmap_local_page",
+   "kmap_atomic"   => "kmap_local_page",
 );
 
 #Create a search pattern for all these strings to speed up a loop below

base-commit: 4a9350597aff50bbd0f4b80ccf49d2e02df5
-- 
2.35.3



Re: [RFC PATCH 2/6] testing/pkeys: Don't use uninitialized variable

2022-06-13 Thread Ira Weiny
On Mon, Jun 13, 2022 at 03:48:56PM -0700, Mehta, Sohil wrote:
> On 6/10/2022 4:35 PM, ira.we...@intel.com wrote:
> > diff --git a/tools/testing/selftests/vm/protection_keys.c 
> > b/tools/testing/selftests/vm/protection_keys.c
> > index d0183c381859..43e47de19c0d 100644
> > --- a/tools/testing/selftests/vm/protection_keys.c
> > +++ b/tools/testing/selftests/vm/protection_keys.c
> > @@ -1225,9 +1225,9 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
> > int new_pkey;
> > dprintf1("%s() alloc loop: %d\n", __func__, i);
> > new_pkey = alloc_pkey();
> > -   dprintf4("%s()::%d, err: %d pkey_reg: 0x%016llx"
> > +   dprintf4("%s()::%d, errno: %d pkey_reg: 0x%016llx"
> 
> What is errno referring to over here?  There are a few things happening in
> alloc_pkey().

Good point, but the only system call in alloc_pkey() is pkey_alloc() so it will
be the errno from there.

In test_pkey_alloc_exhaust() we are expecting the errno to be from pkey_alloc()

...
if ((new_pkey == -1) && (errno == ENOSPC)) {
...


> I guess it would show the latest error that happened. Does
> errno need to be set to 0 before the call?

Maybe.  Now that I look again errno is printed just below at level 2.

dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, 
ENOSPC);

I missed that.

> 
> Also, would it be useful to print the return value (new_pkey) from
> alloc_pkey() here?

Yea that might be useful.  Perhaps change err to new_pkey instead since errno
is already printed.

Ira

> 
> > " shadow: 0x%016llx\n",
> > -   __func__, __LINE__, err, __read_pkey_reg(),
> > +   __func__, __LINE__, errno, __read_pkey_reg(),
> > shadow_pkey_reg);
> > read_pkey_reg(); /* for shadow checking */
> > dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, 
> > ENOSPC);
> 
> Sohil


Re: [RFC PATCH 1/6] testing/pkeys: Add command line options

2022-06-13 Thread Ira Weiny
On Mon, Jun 13, 2022 at 03:31:02PM -0700, Mehta, Sohil wrote:
> On 6/10/2022 4:35 PM, ira.we...@intel.com wrote:
> 
> > Add command line options for debug level and number of iterations.
> > 
> > $ ./protection_keys_64 -h
> > Usage: ./protection_keys_64 [-h,-d,-i ]
> >  --help,-h   This help
> > --debug,-d  Increase debug level for each -d
> 
> Is this mechanism (of counting d's) commonplace in other selftests as well?
> Looking at the test code for pkeys the debug levels run from 1-5. That feels
> like quite a few d's to input :)

I've seen (and used) it before yes.  See ibnetdiscover.

...
# Debugging flags
-d raise the IB debugging level. May be used several times (-ddd or -d -d -d).
...
-v increase the application verbosity level. May be used several times (-vv or 
-v -v -v)
...
- https://linux.die.net/man/8/ibnetdiscover

But a much more mainstream example I can think of is verbosity level with
lspci.

16:29:12 > lspci -h
...
Display options:
-v  Be verbose (-vv or -vvv for higher verbosity)
...

> 
> Would it be easier to input the number in the command line directly?
> 
> Either way it would be useful to know the debug range in the help.
> Maybe something like:
>   --debug,-d  Increase debug level for each -d (1-5)

I'm inclined not to do this because it would encode the max debug level.  On
the other hand I'm not sure why there are 5 levels now.  ;-)

Having the multiple options specified was an easy way to maintain the large
number of levels.

Ira

> 
> The patch seems fine to me otherwise.
> 
> > --iterations,-i   repeate test  times
> > default: 22
> > 
> 
> Thanks,
> Sohil


Re: [RFC PATCH 6/6] pkeys: Change mm_pkey_free() to void

2022-06-13 Thread Ira Weiny
On Mon, Jun 13, 2022 at 09:17:06AM +, Christophe Leroy wrote:
> 
> 
> Le 11/06/2022 à 01:35, ira.we...@intel.com a écrit :
> > From: Ira Weiny 
> > 
> > Now that the pkey arch support is no longer checked in mm_pkey_free()
> > there is no reason to have it return int.
> 
> Right, I see this is doing what I commented in previous patch.

Yes because it was suggested by Sohil I decided to make it a separate patch to
make the credit easier.

> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 41458e729c27..e872bdd2e228 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -809,8 +809,10 @@ SYSCALL_DEFINE1(pkey_free, int, pkey)
> > return ret;
> >   
> > mmap_write_lock(current->mm);
> > -   if (mm_pkey_is_allocated(current->mm, pkey))
> > -   ret = mm_pkey_free(current->mm, pkey);
> > +   if (mm_pkey_is_allocated(current->mm, pkey)) {
> > +   mm_pkey_free(current->mm, pkey);
> > +   ret = 0;
> > +   }
> 
> Or you could have ret = 0 by default and do
> 
>   if (mm_pkey_is_allocated(current->mm, pkey))
>   mm_pkey_free(current->mm, pkey);
>   else
>   ret = -EINVAL;

Yes that fits the kernel style better.

Thanks for the review!
Ira

> 
> > mmap_write_unlock(current->mm);
> >   
> > /*


[RFC PATCH 6/6] pkeys: Change mm_pkey_free() to void

2022-06-10 Thread ira . weiny
From: Ira Weiny 

Now that the pkey arch support is no longer checked in mm_pkey_free()
there is no reason to have it return int.

Change the return value to void.

Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Suggested-by: Sohil Mehta 
Signed-off-by: Ira Weiny 
---
 arch/powerpc/include/asm/pkeys.h | 4 +---
 arch/x86/include/asm/pkeys.h | 4 +---
 include/linux/pkeys.h| 5 +
 mm/mprotect.c| 6 --
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index e96aa91f817b..4d01a48ab941 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -105,11 +105,9 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
return ret;
 }
 
-static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
+static inline void mm_pkey_free(struct mm_struct *mm, int pkey)
 {
__mm_pkey_free(mm, pkey);
-
-   return 0;
 }
 
 /*
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index da02737cc4d1..1f408f46fa9a 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -105,11 +105,9 @@ int mm_pkey_alloc(struct mm_struct *mm)
 }
 
 static inline
-int mm_pkey_free(struct mm_struct *mm, int pkey)
+void mm_pkey_free(struct mm_struct *mm, int pkey)
 {
mm_set_pkey_free(mm, pkey);
-
-   return 0;
 }
 
 static inline int vma_pkey(struct vm_area_struct *vma)
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 86be8bf27b41..bf98c50a3437 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -30,10 +30,7 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
return -1;
 }
 
-static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
-{
-   return -EINVAL;
-}
+static inline void mm_pkey_free(struct mm_struct *mm, int pkey) { }
 
 static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 41458e729c27..e872bdd2e228 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -809,8 +809,10 @@ SYSCALL_DEFINE1(pkey_free, int, pkey)
return ret;
 
mmap_write_lock(current->mm);
-   if (mm_pkey_is_allocated(current->mm, pkey))
-   ret = mm_pkey_free(current->mm, pkey);
+   if (mm_pkey_is_allocated(current->mm, pkey)) {
+   mm_pkey_free(current->mm, pkey);
+   ret = 0;
+   }
mmap_write_unlock(current->mm);
 
/*
-- 
2.35.1



[RFC PATCH 4/6] pkeys: Lift pkey hardware check for pkey_alloc()

2022-06-10 Thread ira . weiny
From: Ira Weiny 

pkey_alloc() is documented to return ENOSPC when the hardware does not
support pkeys.  On x86, pkey_alloc() incorrectly returns EINVAL.

This is because mm_pkey_alloc() does not check for pkey support before
returning a key.  Therefore, if the keys are not exhausted pkey_alloc()
continues on to call arch_set_user_pkey_access().  Unfortunately, when
arch_set_user_pkey_access() detects the failed support it overwrites the
ENOSPC return value with EINVAL.

Ensure consistent behavior across architectures by lifting this check to
the core mm code.

Remove a couple of 'we' references in code comments as well.

Cc: ah...@chromium.org
Cc: cleme...@chromium.org
Cc: gdee...@chromium.org
Cc: jkumme...@chromium.org
Cc: manosk...@chromium.org
Cc: thiba...@chromium.org
Cc: Florian Weimer 
Cc: Sohil Mehta 
Cc: Andrew Morton 
Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Cc: linux-...@vger.kernel.org
Fixes: e8c24d3a23a4 ("x86/pkeys: Allocation/free syscalls")
Signed-off-by: Ira Weiny 

---
Thanks to Sohil for pointing out that the commit message could be more
clear WRT how EINVAL is returned incorrectly.
---
 arch/powerpc/include/asm/pkeys.h | 8 +++-
 mm/mprotect.c| 3 +++
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 59a2c7dbc78f..2c8351248793 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -85,18 +85,16 @@ static inline bool mm_pkey_is_allocated(struct mm_struct 
*mm, int pkey)
 static inline int mm_pkey_alloc(struct mm_struct *mm)
 {
/*
-* Note: this is the one and only place we make sure that the pkey is
+* Note: this is the one and only place to make sure that the pkey is
 * valid as far as the hardware is concerned. The rest of the kernel
 * trusts that only good, valid pkeys come out of here.
 */
u32 all_pkeys_mask = (u32)(~(0x0));
int ret;
 
-   if (!mmu_has_feature(MMU_FTR_PKEY))
-   return -1;
/*
-* Are we out of pkeys? We must handle this specially because ffz()
-* behavior is undefined if there are no zeros.
+* Out of pkeys?  Handle this specially because ffz() behavior is
+* undefined if there are no zeros.
 */
if (mm_pkey_allocation_map(mm) == all_pkeys_mask)
return -1;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ba5592655ee3..56d35de33725 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -773,6 +773,9 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned 
long, init_val)
int pkey;
int ret;
 
+   if (!arch_pkeys_enabled())
+   return -ENOSPC;
+
/* No flags supported yet. */
if (flags)
return -EINVAL;
-- 
2.35.1



[RFC PATCH 3/6] testing/pkeys: Add additional test for pkey_alloc()

2022-06-10 Thread ira . weiny
From: Ira Weiny 

When pkeys are not available on the hardware pkey_alloc() has specific
behavior which was previously untested.

Add test for this.

Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Signed-off-by: Ira Weiny 
---
 tools/testing/selftests/vm/protection_keys.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 43e47de19c0d..4b733a75606f 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1554,6 +1554,16 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, 
u16 pkey)
do_not_expect_pkey_fault("plain read on recently PROT_EXEC area");
 }
 
+void test_pkey_alloc_on_unsupported_cpu(void)
+{
+   int test_pkey = sys_pkey_alloc(0, 0);
+
+   dprintf1("pkey_alloc: %d (%d %s)\n", test_pkey, errno,
+strerror(errno));
+   pkey_assert(test_pkey < 0);
+   pkey_assert(errno == ENOSPC);
+}
+
 void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey)
 {
int size = PAGE_SIZE;
@@ -1688,6 +1698,8 @@ int main(int argc, char *argv[])
 
printf("running PKEY tests for unsupported CPU/OS\n");
 
+   test_pkey_alloc_on_unsupported_cpu();
+
ptr  = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, 
-1, 0);
assert(ptr != (void *)-1);
test_mprotect_pkey_on_unsupported_cpu(ptr, 1);
-- 
2.35.1



[RFC PATCH 0/6] User pkey minor bug fixes

2022-06-10 Thread ira . weiny
From: Ira Weiny 


While evaluating the possibility of defining a new type for pkeys within the
kernel I found a couple of minor bugs.

Because these patches clean up the return codes from system calls I'm sending
this out RFC hoping that users will speak up if anything breaks.

I'm not too concerned about pkey_free() because it is unlikely that anyone is
checking the return code.  Interestingly enough, glibc recommends not calling
pkey_free() because it does not change the access rights to the key and may be
subsequently allocated again.[1][2]

The pkey_alloc() is more concerning.  However, I checked the Chrome source and
it does not differentiate among the return codes and maps all errors into
kNoMemoryProtectionKey.

glibc says it returns ENOSYS if the system does not support pkeys but I don't
see where ENOSYS is returned?  AFAICS it just returns what the kernel returns.
So it is probably up to user of glibc.

In addition I've enhanced the pkey tests to verify and test the changes.

Thanks to Rick Edgecombe and Sohil Mehta for internal review.


[1] Quote from manual/memory.texi:

Calling this function does not change the access rights of the freed
protection key.  The calling thread and other threads may retain access
to it, even if it is subsequently allocated again.  For this reason, it
is not recommended to call the @code{pkey_free} function.

[2] PKS had a similar issue and went to statically allocated keys instead.


Ira Weiny (6):
  testing/pkeys: Add command line options
  testing/pkeys: Don't use uninitialized variable
  testing/pkeys: Add additional test for pkey_alloc()
  pkeys: Lift pkey hardware check for pkey_alloc()
  pkeys: Up level pkey_free() checks
  pkeys: Change mm_pkey_free() to void

 arch/powerpc/include/asm/pkeys.h | 18 ++---
 arch/x86/include/asm/pkeys.h |  7 +-
 include/linux/pkeys.h|  5 +-
 mm/mprotect.c| 13 +++-
 tools/testing/selftests/vm/pkey-helpers.h|  7 +-
 tools/testing/selftests/vm/protection_keys.c | 75 +---
 6 files changed, 86 insertions(+), 39 deletions(-)


base-commit: 874c8ca1e60b2c564a48f7e7acc40d328d5c8733
-- 
2.35.1



[RFC PATCH 1/6] testing/pkeys: Add command line options

2022-06-10 Thread ira . weiny
From: Ira Weiny 

It is more convenient to use command line options for debug and
iterations vs changing the code and recompiling.

Add command line options for debug level and number of iterations.

$ ./protection_keys_64 -h
Usage: ./protection_keys_64 [-h,-d,-i ]
--help,-h   This help
--debug,-d  Increase debug level for each -d
--iterations,-i   repeate test  times
default: 22

Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Signed-off-by: Ira Weiny 
---
 tools/testing/selftests/vm/pkey-helpers.h|  7 +--
 tools/testing/selftests/vm/protection_keys.c | 59 +---
 2 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 92f3be3dd8e5..7aaac1c8ebca 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -23,9 +23,8 @@
 
 #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
 
-#ifndef DEBUG_LEVEL
-#define DEBUG_LEVEL 0
-#endif
+extern int debug_level;
+
 #define DPRINT_IN_SIGNAL_BUF_SIZE 4096
 extern int dprint_in_signal;
 extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
@@ -58,7 +57,7 @@ static inline void sigsafe_printf(const char *format, ...)
}
 }
 #define dprintf_level(level, args...) do { \
-   if (level <= DEBUG_LEVEL)   \
+   if (level <= debug_level)   \
sigsafe_printf(args);   \
 } while (0)
 #define dprintf0(args...) dprintf_level(0, args)
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 291bc1e07842..d0183c381859 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -44,9 +44,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pkey-helpers.h"
 
+#define DEFAULT_ITERATIONS 22
+
+int debug_level;
 int iteration_nr = 1;
 int test_nr;
 
@@ -361,7 +365,7 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 * here.
 */
dprintf1("pkey_reg_xstate_offset: %d\n", pkey_reg_xstate_offset());
-   if (DEBUG_LEVEL > 4)
+   if (debug_level > 4)
dump_mem(pkey_reg_ptr - 128, 256);
pkey_assert(*pkey_reg_ptr);
 #endif /* arch */
@@ -480,7 +484,7 @@ int sys_mprotect_pkey(void *ptr, size_t size, unsigned long 
orig_prot,
dprintf2("SYS_mprotect_key sret: %d\n", sret);
dprintf2("SYS_mprotect_key prot: 0x%lx\n", orig_prot);
dprintf2("SYS_mprotect_key failed, errno: %d\n", errno);
-   if (DEBUG_LEVEL >= 2)
+   if (debug_level >= 2)
perror("SYS_mprotect_pkey");
}
return sret;
@@ -1116,7 +1120,7 @@ void test_kernel_write_of_write_disabled_region(int *ptr, 
u16 pkey)
pkey_write_deny(pkey);
ret = read(test_fd, ptr, 100);
dprintf1("read ret: %d\n", ret);
-   if (ret < 0 && (DEBUG_LEVEL > 0))
+   if (ret < 0 && (debug_level > 0))
perror("verbose read result (OK for this to be bad)");
pkey_assert(ret);
 }
@@ -1155,7 +1159,7 @@ void test_kernel_gup_write_to_write_disabled_region(int 
*ptr, u16 pkey)
pkey_write_deny(pkey);
futex_ret = syscall(SYS_futex, ptr, FUTEX_WAIT, some_int-1, NULL,
&ignored, ignored);
-   if (DEBUG_LEVEL > 0)
+   if (debug_level > 0)
perror("futex");
dprintf1("futex() ret: %d\n", futex_ret);
 }
@@ -1626,11 +1630,52 @@ void pkey_setup_shadow(void)
shadow_pkey_reg = __read_pkey_reg();
 }
 
-int main(void)
+static void print_help_and_exit(char *argv0)
+{
+   printf("Usage: %s [-h,-d,-i ]\n", argv0);
+   printf("--help,-h   This help\n");
+   printf("--debug,-d  Increase debug level for each -d\n");
+   printf("--iterations,-i   repeate test  times\n");
+   printf("default: %d\n", DEFAULT_ITERATIONS);
+   printf("\n");
+}
+
+int main(int argc, char *argv[])
 {
-   int nr_iterations = 22;
-   int pkeys_supported = is_pkeys_supported();
+   int nr_iterations = DEFAULT_ITERATIONS;
+   int pkeys_supported;
+
+   while (1) {
+   static struct option long_options[] = {
+   {"help",no_argument,0,  'h' },
+   {"debug",   no_argument,0,  'd' },
+   {"iterations",  required_argument,  0,  'i' },
+   {0, 0,  0,  0 }
+   };
+   int option_index = 0;

[RFC PATCH 5/6] pkeys: Up level pkey_free() checks

2022-06-10 Thread ira . weiny
From: Ira Weiny 

x86 is missing a hardware check for pkey support in pkey_free().  While
the net result is the same (-EINVAL returned), pkey_free() has well
defined behavior which will be easier to maintain in one place.

For powerpc the return code is -1 rather than -EINVAL.  This changes
that behavior slightly but this is very unlikely to break any user
space.

Lift the checks for pkey_free() to the core mm code and ensure
consistency with returning -EINVAL.

Cc: ah...@chromium.org
Cc: cleme...@chromium.org
Cc: gdee...@chromium.org
Cc: jkumme...@chromium.org
Cc: manosk...@chromium.org
Cc: thiba...@chromium.org
Cc: Florian Weimer 
Cc: Andrew Morton 
Cc: linux-...@vger.kernel.org
Cc: Sohil Mehta 
Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Signed-off-by: Ira Weiny 

---
Thanks to Sohil for suggesting I mention the powerpc return value in the
commit message.

Also Sohil suggested changing mm_pkey_free() from int to void.  This is
added as a separate patch with his suggested by.
---
 arch/powerpc/include/asm/pkeys.h | 6 --
 arch/x86/include/asm/pkeys.h | 3 ---
 mm/mprotect.c| 8 ++--
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 2c8351248793..e96aa91f817b 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -107,12 +107,6 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
 
 static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
 {
-   if (!mmu_has_feature(MMU_FTR_PKEY))
-   return -1;
-
-   if (!mm_pkey_is_allocated(mm, pkey))
-   return -EINVAL;
-
__mm_pkey_free(mm, pkey);
 
return 0;
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 2e6c04d8a45b..da02737cc4d1 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -107,9 +107,6 @@ int mm_pkey_alloc(struct mm_struct *mm)
 static inline
 int mm_pkey_free(struct mm_struct *mm, int pkey)
 {
-   if (!mm_pkey_is_allocated(mm, pkey))
-   return -EINVAL;
-
mm_set_pkey_free(mm, pkey);
 
return 0;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 56d35de33725..41458e729c27 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -803,10 +803,14 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, 
unsigned long, init_val)
 
 SYSCALL_DEFINE1(pkey_free, int, pkey)
 {
-   int ret;
+   int ret = -EINVAL;
+
+   if (!arch_pkeys_enabled())
+   return ret;
 
mmap_write_lock(current->mm);
-   ret = mm_pkey_free(current->mm, pkey);
+   if (mm_pkey_is_allocated(current->mm, pkey))
+   ret = mm_pkey_free(current->mm, pkey);
mmap_write_unlock(current->mm);
 
/*
-- 
2.35.1



[RFC PATCH 2/6] testing/pkeys: Don't use uninitialized variable

2022-06-10 Thread ira . weiny
From: Ira Weiny 

err was being used in test_pkey_alloc_exhaust() prior to being assigned.
errno is useful to know after a failed alloc_pkey() call.

Change err to errno in the debug print.

Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Signed-off-by: Ira Weiny 
---
 tools/testing/selftests/vm/protection_keys.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index d0183c381859..43e47de19c0d 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1225,9 +1225,9 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
int new_pkey;
dprintf1("%s() alloc loop: %d\n", __func__, i);
new_pkey = alloc_pkey();
-   dprintf4("%s()::%d, err: %d pkey_reg: 0x%016llx"
+   dprintf4("%s()::%d, errno: %d pkey_reg: 0x%016llx"
" shadow: 0x%016llx\n",
-   __func__, __LINE__, err, __read_pkey_reg(),
+   __func__, __LINE__, errno, __read_pkey_reg(),
shadow_pkey_reg);
read_pkey_reg(); /* for shadow checking */
dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, 
ENOSPC);
-- 
2.35.1



Re: [PATCH 5/5] x86/pkeys: Standardize on u8 for pkey type

2022-03-15 Thread Ira Weiny
On Tue, Mar 15, 2022 at 09:03:26AM -0700, Dave Hansen wrote:
> On 3/15/22 08:53, Ira Weiny wrote:
> > On Mon, Mar 14, 2022 at 04:49:12PM -0700, Dave Hansen wrote:
> >> On 3/10/22 16:57, ira.we...@intel.com wrote:
> >>> From: Ira Weiny 
> >>>
> >>> The number of pkeys supported on x86 and powerpc are much smaller than a
> >>> u16 value can hold.  It is desirable to standardize on the type for
> >>> pkeys.  powerpc currently supports the most pkeys at 32.  u8 is plenty
> >>> large for that.
> >>>
> >>> Standardize on the pkey types by changing u16 to u8.
> >>
> >> How widely was this intended to "standardize" things?  Looks like it may
> >> have missed a few spots.
> > 
> > Sorry I think the commit message is misleading you.  The justification of 
> > u8 as
> > the proper type is that no arch has a need for more than 255 pkeys.
> > 
> > This specific patch was intended to only change x86.  Per that goal I don't 
> > see
> > any other places in x86 which uses u16 after this patch.
> > 
> > $ git grep u16 arch/x86 | grep key
> > arch/x86/events/intel/uncore_discovery.c:   const u16 *type_id = key;
> > arch/x86/include/asm/intel_pconfig.h:   u16 keyid;
> > arch/x86/include/asm/mmu.h: u16 pkey_allocation_map;
> > arch/x86/include/asm/pkeys.h:   u16 all_pkeys_mask = ((1U << 
> > arch_max_pkey()) - 1);
> 
> I was also looking at the generic mm code.

Ah yea that needs to be sorted out too I think.

> 
> >> Also if we're worried about the type needing to changY or with the wrong
> >> type being used, I guess we could just to a pkey_t typedef.
> > 
> > I'm not 'worried' about it.  But I do think it makes the code cleaner and 
> > more
> > self documenting.
> 
> Yeah, consistency is good.  Do you mind taking a look at how a pkey_t
> would look, and also seeing how much core mm code should use it?

I don't mind at all.

Ira


Re: [PATCH 5/5] x86/pkeys: Standardize on u8 for pkey type

2022-03-15 Thread Ira Weiny
On Mon, Mar 14, 2022 at 04:49:12PM -0700, Dave Hansen wrote:
> On 3/10/22 16:57, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > The number of pkeys supported on x86 and powerpc are much smaller than a
> > u16 value can hold.  It is desirable to standardize on the type for
> > pkeys.  powerpc currently supports the most pkeys at 32.  u8 is plenty
> > large for that.
> > 
> > Standardize on the pkey types by changing u16 to u8.
> 
> How widely was this intended to "standardize" things?  Looks like it may
> have missed a few spots.

Sorry I think the commit message is misleading you.  The justification of u8 as
the proper type is that no arch has a need for more than 255 pkeys.

This specific patch was intended to only change x86.  Per that goal I don't see
any other places in x86 which uses u16 after this patch.

$ git grep u16 arch/x86 | grep key
arch/x86/events/intel/uncore_discovery.c:   const u16 *type_id = key;
arch/x86/include/asm/intel_pconfig.h:   u16 keyid;
arch/x86/include/asm/mmu.h: u16 pkey_allocation_map;
arch/x86/include/asm/pkeys.h:   u16 all_pkeys_mask = ((1U << arch_max_pkey()) - 
1);

> 
> Also if we're worried about the type needing to change or with the wrong
> type being used, I guess we could just to a pkey_t typedef.

I'm not 'worried' about it.  But I do think it makes the code cleaner and more
self documenting.

Ira


[PATCH 4/5] mm/pkeys: Make pkey unsigned in arch_set_user_pkey_access()

2022-03-10 Thread ira . weiny
From: Ira Weiny 

The WARN_ON check in arch_set_user_pkey_access() in the x86 architecture
fails to check for an invalid negative value.

A simple check for less than 0 would fix this issue however, in the call
stack below arch_set_user_pkey_access() the pkey should never be
negative on any architecture.  x86 only supports 16 keys while ppc
supports 32, u8 is therefore large enough for all current architectures
and likely those in the future.

Change the type of the pkey passed to arch_set_user_pkey_access() to u8.

To: Dave Hansen 
To: Michael Ellerman 
Cc: Aneesh Kumar K.V 
Signed-off-by: Ira Weiny 

---
Changes from V1:
Make this part of a generic pkey clean up series.
---
 arch/powerpc/include/asm/pkeys.h | 4 ++--
 arch/powerpc/mm/book3s64/pkeys.c | 2 +-
 arch/x86/include/asm/pkeys.h | 4 ++--
 arch/x86/kernel/fpu/xstate.c | 2 +-
 include/linux/pkeys.h| 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 59a2c7dbc78f..e70615a1da9b 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -143,9 +143,9 @@ static inline int arch_override_mprotect_pkey(struct 
vm_area_struct *vma,
return __arch_override_mprotect_pkey(vma, prot, pkey);
 }
 
-extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+extern int __arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
   unsigned long init_val);
-static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+static inline int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
unsigned long init_val)
 {
if (!mmu_has_feature(MMU_FTR_PKEY))
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index d6456f8846de..310feb9efd57 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -333,7 +333,7 @@ static inline void init_iamr(u8 pkey, u8 init_bits)
  * Set the access rights in AMR IAMR and UAMOR registers for @pkey to that
  * specified in @init_val.
  */
-int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+int __arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
unsigned long init_val)
 {
u64 new_amr_bits = 0x0ul;
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 2e6c04d8a45b..3f5c236e34cd 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -9,8 +9,8 @@
  */
 #define arch_max_pkey() (cpu_feature_enabled(X86_FEATURE_OSPKE) ? 16 : 1)
 
-extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val);
+extern int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
+unsigned long init_val);
 
 static inline bool arch_pkeys_enabled(void)
 {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 7c7824ae7862..db511bec57e5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1068,7 +1068,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int 
xfeature_nr)
  * This will go out and modify PKRU register to set the access
  * rights for @pkey to @init_val.
  */
-int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
  unsigned long init_val)
 {
u32 old_pkru, new_pkru_bits = 0;
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 86be8bf27b41..aa40ed2fb0fc 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -35,7 +35,7 @@ static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
return -EINVAL;
 }
 
-static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+static inline int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
unsigned long init_val)
 {
return 0;
-- 
2.35.1



[PATCH 5/5] x86/pkeys: Standardize on u8 for pkey type

2022-03-10 Thread ira . weiny
From: Ira Weiny 

The number of pkeys supported on x86 and powerpc are much smaller than a
u16 value can hold.  It is desirable to standardize on the type for
pkeys.  powerpc currently supports the most pkeys at 32.  u8 is plenty
large for that.

Standardize on the pkey types by changing u16 to u8.

To: Dave Hansen 
Cc: Aneesh Kumar K.V 
Signed-off-by: Ira Weiny 
---
 arch/x86/include/asm/pgtable.h | 4 ++--
 arch/x86/include/asm/pkru.h| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 8a9432fb3802..cb89f1224d8a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1357,7 +1357,7 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
 }
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
 
-static inline u16 pte_flags_pkey(unsigned long pte_flags)
+static inline u8 pte_flags_pkey(unsigned long pte_flags)
 {
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
/* ifdef to avoid doing 59-bit shift on 32-bit values */
@@ -1367,7 +1367,7 @@ static inline u16 pte_flags_pkey(unsigned long pte_flags)
 #endif
 }
 
-static inline bool __pkru_allows_pkey(u16 pkey, bool write)
+static inline bool __pkru_allows_pkey(u8 pkey, bool write)
 {
u32 pkru = read_pkru();
 
diff --git a/arch/x86/include/asm/pkru.h b/arch/x86/include/asm/pkru.h
index 74f0a2d34ffd..06d088f06229 100644
--- a/arch/x86/include/asm/pkru.h
+++ b/arch/x86/include/asm/pkru.h
@@ -16,13 +16,13 @@ extern u32 init_pkru_value;
 #define pkru_get_init_value()  0
 #endif
 
-static inline bool __pkru_allows_read(u32 pkru, u16 pkey)
+static inline bool __pkru_allows_read(u32 pkru, u8 pkey)
 {
int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits));
 }
 
-static inline bool __pkru_allows_write(u32 pkru, u16 pkey)
+static inline bool __pkru_allows_write(u32 pkru, u8 pkey)
 {
int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
/*
-- 
2.35.1



[PATCH 3/5] powerpc/pkeys: Properly type pkey in init_{i}amr()

2022-03-10 Thread ira . weiny
From: Ira Weiny 

Negative values passed to pkeyshift() will cause an overflow of the amr
and imar values.  Pkey should not be negative in this call path and u8
is large enough for the 32 pkeys available on powerpc.

Change pkey to u8 in init_amr() and init_iamr().

To: Michael Ellerman 
Cc: Aneesh Kumar K.V 
Cc: Dave Hansen 
Signed-off-by: Ira Weiny 
---
 arch/powerpc/mm/book3s64/pkeys.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 753e62ba67af..d6456f8846de 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -310,7 +310,7 @@ void pkey_mm_init(struct mm_struct *mm)
mm->context.execute_only_pkey = execute_only_key;
 }
 
-static inline void init_amr(int pkey, u8 init_bits)
+static inline void init_amr(u8 pkey, u8 init_bits)
 {
u64 new_amr_bits = (((u64)init_bits & 0x3UL) << pkeyshift(pkey));
u64 old_amr = current_thread_amr() & ~((u64)(0x3ul) << pkeyshift(pkey));
@@ -318,7 +318,7 @@ static inline void init_amr(int pkey, u8 init_bits)
current->thread.regs->amr = old_amr | new_amr_bits;
 }
 
-static inline void init_iamr(int pkey, u8 init_bits)
+static inline void init_iamr(u8 pkey, u8 init_bits)
 {
u64 new_iamr_bits = (((u64)init_bits & 0x1UL) << pkeyshift(pkey));
u64 old_iamr = current_thread_iamr() & ~((u64)(0x1ul) << 
pkeyshift(pkey));
-- 
2.35.1



[PATCH 2/5] x86/pkeys: Remove __arch_set_user_pkey_access() declaration

2022-03-10 Thread ira . weiny
From: Ira Weiny 

In the x86 code __arch_set_user_pkey_access() is not used and is not
defined.

Remove the dead declaration.

To: Dave Hansen 
Signed-off-by: Ira Weiny 

---
Changes from V1:
Make this part of a series of pkey clean ups
---
 arch/x86/include/asm/pkeys.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 9c530530b9a7..2e6c04d8a45b 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -41,9 +41,6 @@ static inline int arch_override_mprotect_pkey(struct 
vm_area_struct *vma,
return __arch_override_mprotect_pkey(vma, prot, pkey);
 }
 
-extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val);
-
 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | 
VM_PKEY_BIT3)
 
 #define mm_pkey_allocation_map(mm) (mm->context.pkey_allocation_map)
@@ -118,9 +115,6 @@ int mm_pkey_free(struct mm_struct *mm, int pkey)
return 0;
 }
 
-extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val);
-
 static inline int vma_pkey(struct vm_area_struct *vma)
 {
unsigned long vma_pkey_mask = VM_PKEY_BIT0 | VM_PKEY_BIT1 |
-- 
2.35.1



[PATCH 0/5] Pkey User clean up patches

2022-03-10 Thread ira . weiny
From: Ira Weiny 

I'm looking for acks that this is acceptable for official submission to the
maintainers.  I believe the code to be better than RFC quality but I realize
that the type changes may be more churn than is desired.

The following patches contain pkey cleanups and an attempt to standardize on
the type used for pkeys.

The PKS code is standardizing on u8 for the pkey type and for some of the call
paths in the user space code this should work as well.

Ira Weiny (5):
x86/pkeys: Clean up arch_set_user_pkey_access() declaration
x86/pkeys: Remove __arch_set_user_pkey_access() declaration
powerpc/pkeys: Properly type pkey in init_{i}amr()
mm/pkeys: Make pkey unsigned in arch_set_user_pkey_access()
x86/pkeys: Standardize on u8 for pkey type

arch/powerpc/include/asm/pkeys.h | 4 ++--
arch/powerpc/mm/book3s64/pkeys.c | 6 +++---
arch/x86/include/asm/pgtable.h | 4 ++--
arch/x86/include/asm/pkeys.h | 12 ++--
arch/x86/include/asm/pkru.h | 4 ++--
arch/x86/kernel/fpu/xstate.c | 2 +-
include/linux/pkeys.h | 2 +-
7 files changed, 13 insertions(+), 21 deletions(-)

--
2.35.1



[PATCH 1/5] x86/pkeys: Clean up arch_set_user_pkey_access() declaration

2022-03-10 Thread ira . weiny
From: Ira Weiny 

arch_set_user_pkey_access() was declared two times in the header.

Remove the 2nd declaration.

Suggested-by: "Edgecombe, Rick P" 
Signed-off-by: Ira Weiny 
---
 arch/x86/include/asm/pkeys.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 1d5f14aff5f6..9c530530b9a7 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -118,8 +118,6 @@ int mm_pkey_free(struct mm_struct *mm, int pkey)
return 0;
 }
 
-extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val);
 extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
 
-- 
2.35.1



Re: [PATCH] pkeys: Make pkey unsigned in arch_set_user_pkey_access()

2022-03-07 Thread Ira Weiny
On Mon, Mar 07, 2022 at 12:30:03PM +0530, Aneesh Kumar K.V wrote:
> ira.we...@intel.com writes:
> 
> > From: Ira Weiny 
> >
> > The WARN_ON check in arch_set_user_pkey_access() in the x86 architecture
> > fails to check for an invalid negative value.
> >
> > A simple check for less than 0 would fix this issue however, in the call
> > stack below arch_set_user_pkey_access() the pkey should never be
> > negative on any architecture.  It is always best to use correct types
> > when possible.  x86 only supports 16 keys while ppc supports 32, u8 is
> > therefore large enough for all current architectures and likely those in
> > the future.
> 
> Should we do that as a separate patch? ie, now convert the variable to
> unsigned int and later switch all the variables to u8?

Maybe.

> because what we
> now have is confusing.
> 
> static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
>   unsigned long pkey)
> static inline u64 pkey_to_vmflag_bits(u16 pkey)
> 

This looks like a good cleanup as well.  Why not convert
arch_calc_vm_prot_bits() and pkey_to_vmflag_bits() to u8?  (In another patch.)

This is all a result of this PKS conversation:

https://lore.kernel.org/lkml/Yg8C6UkgfBmQlPSq@iweiny-desk3/

That started me down the path of trying to figure out why 'int' was used for
PKRU and I realized that negative values had meaning there which did not apply
to me with PKS.  So at some point a conversion needs to be made between a
'conceptual pkey' (int) and a real pkey (unsigned) IHMO.

It's no bit deal to split this patch into one which converts to unsigned and
then another to u8 (or u16 if there is some arch which may need it that big).

However, digging more:

Is there a reason u16 was used in pkey_to_vmflag_bits()?  How about in
__pkru_allows_read() in the x86 code?  If possible I think u8 should be
standardized but I'm ok with u16 if that is preferred.

Also, am I missing something in init_amr() and init_iamr()?  I think I could
have gone farther and changed init_amr() and init_iamr() right?

>From what I can see the argument to use unsigned long vs u8 (or u16) is some
expectation that pkeys will grow beyond 256 in number.  From what I can see I
don't think that is going to happen.

So do we need to do this in two steps?

Ira

> 
> >
> > Change the type of the pkey passed to arch_set_user_pkey_access() to u8.
> >
> > To: Dave Hansen 
> > To: Michael Ellerman 
> > Cc: Aneesh Kumar K.V 
> > Signed-off-by: Ira Weiny 
> > ---
> >  arch/powerpc/include/asm/pkeys.h | 4 ++--
> >  arch/powerpc/mm/book3s64/pkeys.c | 2 +-
> >  arch/x86/include/asm/pkeys.h | 4 ++--
> >  arch/x86/kernel/fpu/xstate.c | 2 +-
> >  include/linux/pkeys.h| 2 +-
> >  5 files changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> > b/arch/powerpc/include/asm/pkeys.h
> > index 59a2c7dbc78f..e70615a1da9b 100644
> > --- a/arch/powerpc/include/asm/pkeys.h
> > +++ b/arch/powerpc/include/asm/pkeys.h
> > @@ -143,9 +143,9 @@ static inline int arch_override_mprotect_pkey(struct 
> > vm_area_struct *vma,
> > return __arch_override_mprotect_pkey(vma, prot, pkey);
> >  }
> >  
> > -extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
> > +extern int __arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
> >unsigned long init_val);
> 
> 
> > -static inline int arch_set_user_pkey_access(struct task_struct *tsk, int 
> > pkey,
> > +static inline int arch_set_user_pkey_access(struct task_struct *tsk, u8 
> > pkey,
> > unsigned long init_val)
> >  {
> > if (!mmu_has_feature(MMU_FTR_PKEY))
> > diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> > b/arch/powerpc/mm/book3s64/pkeys.c
> > index 753e62ba67af..c048467669df 100644
> > --- a/arch/powerpc/mm/book3s64/pkeys.c
> > +++ b/arch/powerpc/mm/book3s64/pkeys.c
> > @@ -333,7 +333,7 @@ static inline void init_iamr(int pkey, u8 init_bits)
> >   * Set the access rights in AMR IAMR and UAMOR registers for @pkey to that
> >   * specified in @init_val.
> >   */
> > -int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
> > +int __arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
> > unsigned long init_val)
> >  {
> > u64 new_amr_bits = 0x0ul;
> > diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
> > index 5292e6dfe2a7..48efb81f6cc6 100644
> > --- a/arch/x86/include/asm/pkeys.h
> > ++

[PATCH] pkeys: Make pkey unsigned in arch_set_user_pkey_access()

2022-03-04 Thread ira . weiny
From: Ira Weiny 

The WARN_ON check in arch_set_user_pkey_access() in the x86 architecture
fails to check for an invalid negative value.

A simple check for less than 0 would fix this issue however, in the call
stack below arch_set_user_pkey_access() the pkey should never be
negative on any architecture.  It is always best to use correct types
when possible.  x86 only supports 16 keys while ppc supports 32, u8 is
therefore large enough for all current architectures and likely those in
the future.

Change the type of the pkey passed to arch_set_user_pkey_access() to u8.

To: Dave Hansen 
To: Michael Ellerman 
Cc: Aneesh Kumar K.V 
Signed-off-by: Ira Weiny 
---
 arch/powerpc/include/asm/pkeys.h | 4 ++--
 arch/powerpc/mm/book3s64/pkeys.c | 2 +-
 arch/x86/include/asm/pkeys.h | 4 ++--
 arch/x86/kernel/fpu/xstate.c | 2 +-
 include/linux/pkeys.h| 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 59a2c7dbc78f..e70615a1da9b 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -143,9 +143,9 @@ static inline int arch_override_mprotect_pkey(struct 
vm_area_struct *vma,
return __arch_override_mprotect_pkey(vma, prot, pkey);
 }
 
-extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+extern int __arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
   unsigned long init_val);
-static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+static inline int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
unsigned long init_val)
 {
if (!mmu_has_feature(MMU_FTR_PKEY))
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 753e62ba67af..c048467669df 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -333,7 +333,7 @@ static inline void init_iamr(int pkey, u8 init_bits)
  * Set the access rights in AMR IAMR and UAMOR registers for @pkey to that
  * specified in @init_val.
  */
-int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+int __arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
unsigned long init_val)
 {
u64 new_amr_bits = 0x0ul;
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 5292e6dfe2a7..48efb81f6cc6 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -9,7 +9,7 @@
  */
 #define arch_max_pkey() (cpu_feature_enabled(X86_FEATURE_OSPKE) ? 16 : 1)
 
-extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+extern int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
unsigned long init_val);
 
 static inline bool arch_pkeys_enabled(void)
@@ -115,7 +115,7 @@ int mm_pkey_free(struct mm_struct *mm, int pkey)
return 0;
 }
 
-extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+extern int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
unsigned long init_val);
 
 static inline int vma_pkey(struct vm_area_struct *vma)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 7c7824ae7862..db511bec57e5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1068,7 +1068,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int 
xfeature_nr)
  * This will go out and modify PKRU register to set the access
  * rights for @pkey to @init_val.
  */
-int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
  unsigned long init_val)
 {
u32 old_pkru, new_pkru_bits = 0;
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 86be8bf27b41..aa40ed2fb0fc 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -35,7 +35,7 @@ static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
return -EINVAL;
 }
 
-static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+static inline int arch_set_user_pkey_access(struct task_struct *tsk, u8 pkey,
unsigned long init_val)
 {
return 0;
-- 
2.35.1



Re: [PATCH v3] powerpc/papr_scm: Implement initial support for injecting smart errors

2022-01-18 Thread Ira Weiny
On Thu, Jan 13, 2022 at 05:32:52PM +0530, Vaibhav Jain wrote:
[snip]

>  
> +/* Inject a smart error Add the dirty-shutdown-counter value to the pdsm */
> +static int papr_pdsm_smart_inject(struct papr_scm_priv *p,
> +   union nd_pdsm_payload *payload)
> +{
> + int rc;
> + u32 supported_flags = 0;
> + u64 mask = 0, override = 0;
> +
> + /* Check for individual smart error flags and update mask and override 
> */
> + if (payload->smart_inject.flags & PDSM_SMART_INJECT_HEALTH_FATAL) {
> + supported_flags |= PDSM_SMART_INJECT_HEALTH_FATAL;
> + mask |= PAPR_PMEM_HEALTH_FATAL;
> + override |= payload->smart_inject.fatal_enable ?
> + PAPR_PMEM_HEALTH_FATAL : 0;
> + }
> +
> + if (payload->smart_inject.flags & PDSM_SMART_INJECT_BAD_SHUTDOWN) {
> + supported_flags |= PDSM_SMART_INJECT_BAD_SHUTDOWN;
> + mask |= PAPR_PMEM_SHUTDOWN_DIRTY;
> + override |= payload->smart_inject.unsafe_shutdown_enable ?
> + PAPR_PMEM_SHUTDOWN_DIRTY : 0;
> + }
> +

I'm struggling to see why there is a need for both a flag and an 8 bit 'enable'
value?

Ira



Re: [PATCH 01/18] mm: add a kunmap_local_dirty helper

2021-06-18 Thread Ira Weiny
On Fri, Jun 18, 2021 at 11:37:28AM +0800, Herbert Xu wrote:
> On Thu, Jun 17, 2021 at 08:01:57PM -0700, Ira Weiny wrote:
> >
> > > + flush_kernel_dcache_page(__page);   \
> > 
> > Is this required on 32bit systems?  Why is kunmap_flush_on_unmap() not
> > sufficient on 64bit systems?  The normal kunmap_local() path does that.
> > 
> > I'm sorry but I did not see a conclusion to my query on V1. Herbert implied 
> > the
> > he just copied from the crypto code.[1]  I'm concerned that this _dirty() 
> > call
> > is just going to confuse the users of kmap even more.  So why can't we get 
> > to
> > the bottom of why flush_kernel_dcache_page() needs so much logic around it
> > before complicating the general kernel users.
> > 
> > I would like to see it go away if possible.
> 
> This thread may be related:
> 
> https://lwn.net/Articles/240249/

Interesting!  Thanks!

Digging around a bit more I found:

https://lore.kernel.org/patchwork/patch/439637/

Auditing all the flush_dcache_page() arch code reveals that the mapping field
is either unused, or is checked for NULL.  Furthermore, all the implementations
call page_mapping_file() which further limits the page to not be a swap page.

All flush_kernel_dcache_page() implementations appears to operate the same way
in all arch's which define that call.

So I'm confident now that additional !PageSlab(__page) checks are not needed
and this patch is unnecessary.   Christoph, can we leave this out of the kmap
API and just fold the flush_kernel_dcache_page() calls back into the bvec code?

Unfortunately, I'm not convinced this can be handled completely by
kunmap_local() nor the mem*_page() calls because there is a difference between
flush_dcache_page() and flush_kernel_dcache_page() in most archs...  [parisc
being an exception which falls back to flush_kernel_dcache_page()]...

It seems like the generic unmap path _should_ be able to determine which call
to make based on the page but I'd have to look at that more.

Ira


Re: [PATCH 01/18] mm: add a kunmap_local_dirty helper

2021-06-17 Thread Ira Weiny
On Tue, Jun 15, 2021 at 03:24:39PM +0200, Christoph Hellwig wrote:
> Add a helper that calls flush_kernel_dcache_page before unmapping the
> local mapping.  flush_kernel_dcache_page is required for all pages
> potentially mapped into userspace that were written to using kmap*,
> so having a helper that does the right thing can be very convenient.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/highmem-internal.h | 7 +++
>  include/linux/highmem.h  | 4 
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/linux/highmem-internal.h 
> b/include/linux/highmem-internal.h
> index 7902c7d8b55f..bd37706db147 100644
> --- a/include/linux/highmem-internal.h
> +++ b/include/linux/highmem-internal.h
> @@ -224,4 +224,11 @@ do { 
> \
>   __kunmap_local(__addr); \
>  } while (0)
>  
> +#define kunmap_local_dirty(__page, __addr)   \

I think having to store the page and addr to return to kunmap_local_dirty() is
going to be a pain in some code paths.  Not a show stopper but see below...

> +do { \
> + if (!PageSlab(__page))  \

Was there some clarification why the page can't be a Slab page?  Or is this
just an optimization?

> + flush_kernel_dcache_page(__page);   \

Is this required on 32bit systems?  Why is kunmap_flush_on_unmap() not
sufficient on 64bit systems?  The normal kunmap_local() path does that.

I'm sorry but I did not see a conclusion to my query on V1. Herbert implied the
he just copied from the crypto code.[1]  I'm concerned that this _dirty() call
is just going to confuse the users of kmap even more.  So why can't we get to
the bottom of why flush_kernel_dcache_page() needs so much logic around it
before complicating the general kernel users.

I would like to see it go away if possible.

Ira

[1] https://lore.kernel.org/lkml/20210615050258.ga5...@gondor.apana.org.au/

> + kunmap_local(__addr);   \
> +} while (0)
> +
>  #endif
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 832b49b50c7b..65f548db4f2d 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -93,6 +93,10 @@ static inline void kmap_flush_unused(void);
>   * On HIGHMEM enabled systems mapping a highmem page has the side effect of
>   * disabling migration in order to keep the virtual address stable across
>   * preemption. No caller of kmap_local_page() can rely on this side effect.
> + *
> + * If data is written to the returned kernel mapping, the callers needs to
> + * unmap the mapping using kunmap_local_dirty(), else kunmap_local() should
> + * be used.
>   */
>  static inline void *kmap_local_page(struct page *page);
>  
> -- 
> 2.30.2
> 


Re: [PATCH 09/16] ps3disk: use memcpy_{from,to}_bvec

2021-06-11 Thread Ira Weiny
On Fri, Jun 11, 2021 at 08:53:38AM +0200, Christoph Hellwig wrote:
> On Tue, Jun 08, 2021 at 06:48:22PM -0700, Ira Weiny wrote:
> > I'm still not 100% sure that these flushes are needed but the are not 
> > no-ops on
> > every arch.  Would it be best to preserve them after the 
> > memcpy_to/from_bvec()?
> > 
> > Same thing in patch 11 and 14.
> 
> To me it seems kunmap_local should basically always call the equivalent
> of flush_kernel_dcache_page.  parisc does this through
> kunmap_flush_on_unmap, but none of the other architectures with VIVT
> caches or other coherency issues does.
> 
> Does anyone have a history or other insights here?

I went digging into the current callers of flush_kernel_dcache_page() other
than this one.  To see if adding kunmap_flush_on_unmap() to the other arch's
would cause any problems.

In particular this call site stood out because it is not always called?!?!?!?

void sg_miter_stop(struct sg_mapping_iter *miter)
{
...
if ((miter->__flags & SG_MITER_TO_SG) &&
!PageSlab(miter->page))
flush_kernel_dcache_page(miter->page);
...
}

Looking at 

3d77b50c5874 lib/scatterlist.c: don't flush_kernel_dcache_page on slab page[1]

It seems the restrictions they are quoting for the page are completely out of
date.  I don't see any current way for a VM_BUG_ON() to be triggered.  So is
this code really necessary?

More recently this was added:

7e34e0bbc644 crypto: omap-crypto - fix userspace copied buffer access

I'm CC'ing Tero and Herbert to see why they added the SLAB check.


Then we have interesting comments like this...

...
/* This can go away once MIPS implements
 * flush_kernel_dcache_page */
flush_dcache_page(miter->page);
...


And some users optimizing.

...
/* discard mappings */
if (direction == DMA_FROM_DEVICE)
flush_kernel_dcache_page(sg_page(sg));  
...

The uses in fs/exec.c are the most straight forward and can simply rely on the
kunmap() code to replace the call.

In conclusion I don't see a lot of reason to not define kunmap_flush_on_unmap()
on arm, csky, mips, nds32, and sh...  Then remove all the
flush_kernel_dcache_page() call sites and the documentation...

Something like [2] below...  Completely untested of course...

Ira


[1] commit 3d77b50c5874b7e923be946ba793644f82336b75
Author: Ming Lei 
Date:   Thu Oct 31 16:34:17 2013 -0700

lib/scatterlist.c: don't flush_kernel_dcache_page on slab page

Commit b1adaf65ba03 ("[SCSI] block: add sg buffer copy helper
functions") introduces two sg buffer copy helpers, and calls
flush_kernel_dcache_page() on pages in SG list after these pages are
written to.

Unfortunately, the commit may introduce a potential bug:

 - Before sending some SCSI commands, kmalloc() buffer may be passed to
   block layper, so flush_kernel_dcache_page() can see a slab page
   finally

 - According to cachetlb.txt, flush_kernel_dcache_page() is only called
   on "a user page", which surely can't be a slab page.

 - ARCH's implementation of flush_kernel_dcache_page() may use page
   mapping information to do optimization so page_mapping() will see the
   slab page, then VM_BUG_ON() is triggered.

Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
before calling flush_kernel_dcache_page().


[2]


>From 70b537c31d16c2a5e4e92c35895e8c59303bcbef Mon Sep 17 00:00:00 2001
From: Ira Weiny 
Date: Fri, 11 Jun 2021 18:24:27 -0700
Subject: [PATCH] COMPLETELY UNTESTED: highmem: Remove direct calls to 
flush_kernel_dcache_page

When to call flush_kernel_dcache_page() is confusing and inconsistent.  For
architectures which may need to do something the core kmap code should be
leveraged to handle this when direct kernel access is needed.

Like parisc define kunmap_flush_on_unmap() to be called when pages are
unmapped on arm, csky, mpis, nds32, and sh.

Remove all direct calls to flush_kernel_dcache_page() and let the
kunmap() code do this for the users.


Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Ira Weiny 
---
 Documentation/core-api/cachetlb.rst  | 13 -
 arch/arm/include/asm/cacheflush.h|  6 ++
 arch/csky/abiv1/inc/abi/cacheflush.h |  6 ++
 arch/mips/include/asm/cacheflush.h   |  6 ++
 arch/nds32/include/asm/cacheflush.h  |  6 ++
 arch/sh/include/asm/cacheflush.h |  6 ++
 drivers/cr

Re: switch the block layer to use kmap_local_page

2021-06-08 Thread Ira Weiny
On Tue, Jun 08, 2021 at 06:05:47PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series switches the core block layer code and all users of the
> existing bvec kmap helpers to use kmap_local_page.  Drivers that
> currently use open coded kmap_atomic calls will converted in a follow
> on series.

Other than the missing flush_dcache's.

For the series.

Reviewed-by: Ira Weiny 

> 
> Diffstat:
>  arch/mips/include/asm/mach-rc32434/rb.h |2 -
>  block/bio-integrity.c   |   14 --
>  block/bio.c |   37 +++-
>  block/blk-map.c |2 -
>  block/bounce.c  |   35 ++
>  block/t10-pi.c  |   16 
>  drivers/block/ps3disk.c |   19 ++
>  drivers/block/rbd.c |   15 +--
>  drivers/md/dm-writecache.c  |5 +--
>  include/linux/bio.h |   42 
> 
>  include/linux/bvec.h|   27 ++--
>  include/linux/highmem.h |4 +--
>  12 files changed, 64 insertions(+), 154 deletions(-)


Re: [PATCH 14/16] block: use memcpy_from_bvec in __blk_queue_bounce

2021-06-08 Thread Ira Weiny
On Tue, Jun 08, 2021 at 06:06:01PM +0200, Christoph Hellwig wrote:
> Rewrite the actual bounce buffering loop in __blk_queue_bounce to that
> the memcpy_to_bvec helper can be used to perform the data copies.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/bounce.c | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/block/bounce.c b/block/bounce.c
> index a2fc6326b6c9..b5ad09e07bcf 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -243,24 +243,17 @@ void __blk_queue_bounce(struct request_queue *q, struct 
> bio **bio_orig)
>* because the 'bio' is single-page bvec.
>*/
>   for (i = 0, to = bio->bi_io_vec; i < bio->bi_vcnt; to++, i++) {
> - struct page *page = to->bv_page;
> + struct page *bounce_page;
>  
> - if (!PageHighMem(page))
> + if (!PageHighMem(to->bv_page))
>   continue;
>  
> - to->bv_page = mempool_alloc(&page_pool, GFP_NOIO);
> - inc_zone_page_state(to->bv_page, NR_BOUNCE);
> + bounce_page = mempool_alloc(&page_pool, GFP_NOIO);
> + inc_zone_page_state(bounce_page, NR_BOUNCE);
>  
> - if (rw == WRITE) {
> - char *vto, *vfrom;
> -
> - flush_dcache_page(page);
> -
> - vto = page_address(to->bv_page) + to->bv_offset;
> - vfrom = kmap_atomic(page) + to->bv_offset;
> - memcpy(vto, vfrom, to->bv_len);
> - kunmap_atomic(vfrom);
> - }
> + if (rw == WRITE)
> + memcpy_from_bvec(page_address(bounce_page), to);

NIT: the fact that the copy is from 'to' makes my head hurt...  But I don't
see a good way to change that without declaring unnecessary variables...  :-(

The logic seems right.

Ira

> + to->bv_page = bounce_page;
>   }
>  
>   trace_block_bio_bounce(*bio_orig);
> -- 
> 2.30.2
> 


Re: [PATCH 09/16] ps3disk: use memcpy_{from,to}_bvec

2021-06-08 Thread Ira Weiny
On Tue, Jun 08, 2021 at 06:05:56PM +0200, Christoph Hellwig wrote:
>  
>   rq_for_each_segment(bvec, req, iter) {
> - unsigned long flags;
> - dev_dbg(&dev->sbd.core, "%s:%u: bio %u: %u sectors from %llu\n",
> - __func__, __LINE__, i, bio_sectors(iter.bio),
> - iter.bio->bi_iter.bi_sector);
> -
> - size = bvec.bv_len;
> - buf = bvec_kmap_irq(&bvec, &flags);
>   if (gather)
> - memcpy(dev->bounce_buf+offset, buf, size);
> + memcpy_from_bvec(dev->bounce_buf + offset, &bvec);
>   else
> - memcpy(buf, dev->bounce_buf+offset, size);
> - offset += size;
> - flush_kernel_dcache_page(bvec.bv_page);

I'm still not 100% sure that these flushes are needed but the are not no-ops on
every arch.  Would it be best to preserve them after the memcpy_to/from_bvec()?

Same thing in patch 11 and 14.

Ira


Re: [PATCH v3] powerpc/papr_scm: Reduce error severity if nvdimm stats inaccessible

2021-05-08 Thread Ira Weiny
On Sat, May 08, 2021 at 10:06:42AM +0530, Vaibhav Jain wrote:
> Currently drc_pmem_qeury_stats() generates a dev_err in case
> "Enable Performance Information Collection" feature is disabled from
> HMC or performance stats are not available for an nvdimm. The error is
> of the form below:
> 
> papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Failed to query
>performance stats, Err:-10
> 
> This error message confuses users as it implies a possible problem
> with the nvdimm even though its due to a disabled/unavailable
> feature. We fix this by explicitly handling the H_AUTHORITY and
> H_UNSUPPORTED errors from the H_SCM_PERFORMANCE_STATS hcall.
> 
> In case of H_AUTHORITY error an info message is logged instead of an
> error, saying that "Permission denied while accessing performance
> stats" and an EPERM error is returned back.
> 
> In case of H_UNSUPPORTED error we return a EOPNOTSUPP error back from
> drc_pmem_query_stats() indicating that performance stats-query
> operation is not supported on this nvdimm.
> 
> Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
> PHYP')
> Signed-off-by: Vaibhav Jain 

Reviewed-by: Ira Weiny 

> ---
> Changelog
> 
> v3:
> * Return EOPNOTSUPP error in case of H_UNSUPPORTED [ Ira ]
> * Return EPERM in case of H_AUTHORITY [ Ira ]
> * Updated patch description
> 
> v2:
> * Updated the message logged in case of H_AUTHORITY error [ Ira ]
> * Switched from dev_warn to dev_info in case of H_AUTHORITY error.
> * Instead of -EPERM return -EACCESS for H_AUTHORITY error.
> * Added explicit handling of H_UNSUPPORTED error.
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index ef26fe40efb0..e2b69cc3beaf 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -310,6 +310,13 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv 
> *p,
>   dev_err(&p->pdev->dev,
>   "Unknown performance stats, Err:0x%016lX\n", ret[0]);
>   return -ENOENT;
> + } else if (rc == H_AUTHORITY) {
> + dev_info(&p->pdev->dev,
> +  "Permission denied while accessing performance stats");
> + return -EPERM;
> + } else if (rc == H_UNSUPPORTED) {
> + dev_dbg(&p->pdev->dev, "Performance stats unsupported\n");
> + return -EOPNOTSUPP;
>   } else if (rc != H_SUCCESS) {
>   dev_err(&p->pdev->dev,
>   "Failed to query performance stats, Err:%lld\n", rc);
> -- 
> 2.31.1
> 


Re: [PATCH v2] powerpc/papr_scm: Reduce error severity if nvdimm stats inaccessible

2021-05-05 Thread Ira Weiny
On Thu, May 06, 2021 at 12:46:06AM +0530, Vaibhav Jain wrote:
> Currently drc_pmem_qeury_stats() generates a dev_err in case
> "Enable Performance Information Collection" feature is disabled from
> HMC or performance stats are not available for an nvdimm. The error is
> of the form below:
> 
> papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Failed to query
>performance stats, Err:-10
> 
> This error message confuses users as it implies a possible problem
> with the nvdimm even though its due to a disabled/unavailable
> feature. We fix this by explicitly handling the H_AUTHORITY and
> H_UNSUPPORTED errors from the H_SCM_PERFORMANCE_STATS hcall.
> 
> In case of H_AUTHORITY error an info message is logged instead of an
> error, saying that "Permission denied while accessing performance
> stats". Also '-EACCES' error is return instead of -EPERM.

I thought you clarified before that this was a permission issue.  So why change
the error to EACCES?

> 
> In case of H_UNSUPPORTED error we return a -EPERM error back from
> drc_pmem_query_stats() indicating that performance stats-query
> operation is not supported on this nvdimm.

EPERM seems wrong here too...  ENOTSUP?

Ira


Re: [PATCH] powerpc/papr_scm: Reduce error severity if nvdimm stats inaccessible

2021-04-14 Thread Ira Weiny
On Wed, Apr 14, 2021 at 09:51:40PM +0530, Vaibhav Jain wrote:
> Thanks for looking into this patch Ira,
> 
> Ira Weiny  writes:
> 
> > On Wed, Apr 14, 2021 at 06:10:26PM +0530, Vaibhav Jain wrote:
> >> Currently drc_pmem_qeury_stats() generates a dev_err in case
> >> "Enable Performance Information Collection" feature is disabled from
> >> HMC. The error is of the form below:
> >> 
> >> papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Failed to query
> >> performance stats, Err:-10
> >> 
> >> This error message confuses users as it implies a possible problem
> >> with the nvdimm even though its due to a disabled feature.
> >> 
> >> So we fix this by explicitly handling the H_AUTHORITY error from the
> >> H_SCM_PERFORMANCE_STATS hcall and generating a warning instead of an
> >> error, saying that "Performance stats in-accessible".
> >> 
> >> Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
> >> PHYP')
> >> Signed-off-by: Vaibhav Jain 
> >> ---
> >>  arch/powerpc/platforms/pseries/papr_scm.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> >> b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 835163f54244..9216424f8be3 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -277,6 +277,9 @@ static ssize_t drc_pmem_query_stats(struct 
> >> papr_scm_priv *p,
> >>dev_err(&p->pdev->dev,
> >>"Unknown performance stats, Err:0x%016lX\n", ret[0]);
> >>return -ENOENT;
> >> +  } else if (rc == H_AUTHORITY) {
> >> +  dev_warn(&p->pdev->dev, "Performance stats in-accessible");
> >> +  return -EPERM;
> >
> > Is this because of a disabled feature or because of permissions?
> 
> Its because of a disabled feature that revokes permission for a guest to
> retrieve performance statistics.
> 
> The feature is called "Enable Performance Information Collection" and
> once disabled the hcall H_SCM_PERFORMANCE_STATS returns an error
> H_AUTHORITY indicating that the guest doesn't have permission to retrieve
> performance statistics.

In that case would it be appropriate to have the error message indicate a
permission issue?

Something like 'permission denied'?

Ira



Re: [PATCH] powerpc/papr_scm: Reduce error severity if nvdimm stats inaccessible

2021-04-14 Thread Ira Weiny
On Wed, Apr 14, 2021 at 06:10:26PM +0530, Vaibhav Jain wrote:
> Currently drc_pmem_qeury_stats() generates a dev_err in case
> "Enable Performance Information Collection" feature is disabled from
> HMC. The error is of the form below:
> 
> papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Failed to query
>performance stats, Err:-10
> 
> This error message confuses users as it implies a possible problem
> with the nvdimm even though its due to a disabled feature.
> 
> So we fix this by explicitly handling the H_AUTHORITY error from the
> H_SCM_PERFORMANCE_STATS hcall and generating a warning instead of an
> error, saying that "Performance stats in-accessible".
> 
> Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
> PHYP')
> Signed-off-by: Vaibhav Jain 
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 835163f54244..9216424f8be3 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -277,6 +277,9 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv 
> *p,
>   dev_err(&p->pdev->dev,
>   "Unknown performance stats, Err:0x%016lX\n", ret[0]);
>   return -ENOENT;
> + } else if (rc == H_AUTHORITY) {
> + dev_warn(&p->pdev->dev, "Performance stats in-accessible");
> + return -EPERM;

Is this because of a disabled feature or because of permissions?

Ira

>   } else if (rc != H_SUCCESS) {
>   dev_err(&p->pdev->dev,
>   "Failed to query performance stats, Err:%lld\n", rc);
> -- 
> 2.30.2
> 


Re: [PATCH RFC PKS/PMEM 05/58] kmap: Introduce k[un]map_thread

2020-11-09 Thread Ira Weiny
On Tue, Nov 10, 2020 at 02:13:56AM +0100, Thomas Gleixner wrote:
> Ira,
> 
> On Fri, Oct 09 2020 at 12:49, ira weiny wrote:
> > From: Ira Weiny 
> >
> > To correctly support the semantics of kmap() with Kernel protection keys
> > (PKS), kmap() may be required to set the protections on multiple
> > processors (globally).  Enabling PKS globally can be very expensive
> > depending on the requested operation.  Furthermore, enabling a domain
> > globally reduces the protection afforded by PKS.
> >
> > Most kmap() (Aprox 209 of 229) callers use the map within a single thread 
> > and
> > have no need for the protection domain to be enabled globally.  However, the
> > remaining callers do not follow this pattern and, as best I can tell, expect
> > the mapping to be 'global' and available to any thread who may access the
> > mapping.[1]
> >
> > We don't anticipate global mappings to pmem, however in general there is a
> > danger in changing the semantics of kmap().  Effectively, this would cause 
> > an
> > unresolved page fault with little to no information about why the failure
> > occurred.
> >
> > To resolve this a number of options were considered.
> >
> > 1) Attempt to change all the thread local kmap() calls to kmap_atomic()[2]
> > 2) Introduce a flags parameter to kmap() to indicate if the mapping should 
> > be
> >global or not
> > 3) Change ~20 call sites to 'kmap_global()' to indicate that they require a
> >global enablement of the pages.
> > 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping 
> > is to
> >be used within that thread of execution only
> >
> > Option 1 is simply not feasible.  Option 2 would require all of the call 
> > sites
> > of kmap() to change.  Option 3 seems like a good minimal change but there 
> > is a
> > danger that new code may miss the semantic change of kmap() and not get the
> > behavior the developer intended.  Therefore, #4 was chosen.
> 
> There is Option #5:

There is now yes.  :-D

> 
> Convert the thread local kmap() invocations to the proposed kmap_local()
> interface which is coming along [1].

I've been trying to follow that thread.

> 
> That solves a couple of issues:
> 
>  1) It relieves the current kmap_atomic() usage sites from the implict
> pagefault/preempt disable semantics which apply even when
> CONFIG_HIGHMEM is disabled. kmap_local() still can be invoked from
> atomic context.
> 
>  2) Due to #1 it allows to replace the conditional usage of kmap() and
> kmap_atomic() for purely thread local mappings.
> 
>  3) It puts the burden on the HIGHMEM inflicted systems
> 
>  4) It is actually more efficient for most of the pure thread local use
> cases on HIGHMEM inflicted systems because it avoids the overhead of
> the global lock and the potential kmap slot exhaustion. A potential
> preemption will be more expensive, but that's not really the case we
> want to optimize for.
> 
>  5) It solves the RT issue vs. kmap_atomic()
> 
> So instead of creating yet another variety of kmap() which is just
> scratching the particular PKRS itch, can we please consolidate all of
> that on the wider reaching kmap_local() approach?

Yes I agree.  We absolutely don't want more kmap*() calls and I was hoping to
dovetail into your kmap_local() work.[2]

I've pivoted away from this work a bit to clean up all the
kmap()/memcpy*()/kunmaps() as discussed elsewhere in the thread first.[3]  I
was hoping your work would land and then I could s/kmap_thread()/kmap_local()/
on all of these patches.

Also, we can convert the new memcpy_*_page() calls to kmap_local() as well.
[For now my patch just uses kmap_atomic().]

I've not looked at all of the patches in your latest version.  Have you
included converting any of the kmap() call sites?  I thought you were more
focused on converting the kmap_atomic() to kmap_local()?

Ira

> 
> Thanks,
> 
> tglx
>  
> [1] https://lore.kernel.org/lkml/20201103092712.714480...@linutronix.de/

[2] 
https://lore.kernel.org/lkml/20201012195354.gc2046...@iweiny-desk2.sc.intel.com/
[3] https://lore.kernel.org/lkml/20201009213434.GA839@sol.localdomain/
https://lore.kernel.org/lkml/20201013200149.gi3576...@zeniv.linux.org.uk/



Re: [PATCH RFC PKS/PMEM 24/58] fs/freevxfs: Utilize new kmap_thread()

2020-10-13 Thread Ira Weiny
On Tue, Oct 13, 2020 at 12:25:44PM +0100, Christoph Hellwig wrote:
> > -   kaddr = kmap(pp);
> > +   kaddr = kmap_thread(pp);
> > memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE);
> > -   kunmap(pp);
> > +   kunmap_thread(pp);
> 
> You only Cced me on this particular patch, which means I have absolutely
> no idea what kmap_thread and kunmap_thread actually do, and thus can't
> provide an informed review.

Sorry the list was so big I struggled with who to CC and on which patches.

> 
> That being said I think your life would be a lot easier if you add
> helpers for the above code sequence and its counterpart that copies
> to a potential hughmem page first, as that hides the implementation
> details from most users.

Matthew Wilcox and Al Viro have suggested similar ideas.

https://lore.kernel.org/lkml/20201013205012.gi2046...@iweiny-desk2.sc.intel.com/

Ira


Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()

2020-10-13 Thread Ira Weiny
On Tue, Oct 13, 2020 at 09:01:49PM +0100, Al Viro wrote:
> On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
> 
> > static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned 
> > int size)
> > {
> > char *vto = kmap_atomic(to);
> > 
> > memcpy(vto, vfrom, size);
> > kunmap_atomic(vto);
> > }
> > 
> > in linux/highmem.h ?
> 
> You mean, like
> static void memcpy_from_page(char *to, struct page *page, size_t offset, 
> size_t len)
> {
> char *from = kmap_atomic(page);
> memcpy(to, from + offset, len);
> kunmap_atomic(from);
> }
> 
> static void memcpy_to_page(struct page *page, size_t offset, const char 
> *from, size_t len)
> {
> char *to = kmap_atomic(page);
> memcpy(to + offset, from, len);
> kunmap_atomic(to);
> }
> 
> static void memzero_page(struct page *page, size_t offset, size_t len)
> {
> char *addr = kmap_atomic(page);
> memset(addr + offset, 0, len);
> kunmap_atomic(addr);
> }
> 
> in lib/iov_iter.c?  FWIW, I don't like that "highpage" in the name and
> highmem.h as location - these make perfect sense regardless of highmem;
> they are normal memory operations with page + offset used instead of
> a pointer...

I was thinking along those lines as well especially because of the direction
this patch set takes kmap().

Thanks for pointing these out to me.  How about I lift them to a common header?
But if not highmem.h where?

Ira


Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-12 Thread Ira Weiny
On Mon, Oct 12, 2020 at 09:02:54PM +0100, Matthew Wilcox wrote:
> On Mon, Oct 12, 2020 at 12:53:54PM -0700, Ira Weiny wrote:
> > On Mon, Oct 12, 2020 at 05:44:38PM +0100, Matthew Wilcox wrote:
> > > On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
> > > > kmap_atomic() is always preferred over kmap()/kmap_thread().
> > > > kmap_atomic() is _much_ more lightweight since its TLB invalidation is
> > > > always CPU-local and never broadcast.
> > > > 
> > > > So, basically, unless you *must* sleep while the mapping is in place,
> > > > kmap_atomic() is preferred.
> > > 
> > > But kmap_atomic() disables preemption, so the _ideal_ interface would map
> > > it only locally, then on preemption make it global.  I don't even know
> > > if that _can_ be done.  But this email makes it seem like kmap_atomic()
> > > has no downsides.
> > 
> > And that is IIUC what Thomas was trying to solve.
> > 
> > Also, Linus brought up that kmap_atomic() has quirks in nesting.[1]
> > 
> > >From what I can see all of these discussions support the need to have 
> > >something
> > between kmap() and kmap_atomic().
> > 
> > However, the reason behind converting call sites to kmap_thread() are 
> > different
> > between Thomas' patch set and mine.  Both require more kmap granularity.
> > However, they do so with different reasons and underlying implementations 
> > but
> > with the _same_ resulting semantics; a thread local mapping which is
> > preemptable.[2]  Therefore they each focus on changing different call sites.
> > 
> > While this patch set is huge I think it serves a valuable purpose to 
> > identify a
> > large number of call sites which are candidates for this new semantic.
> 
> Yes, I agree.  My problem with this patch-set is that it ties it to
> some Intel feature that almost nobody cares about.

I humbly disagree.  At this level the only thing this is tied to is the idea
that there are additional memory protections available which can be enabled
quickly on a per-thread basis.  PKS on Intel is but 1 implementation of that.

Even the kmap code only has knowledge that there is something which needs to be
done special on a devm page.

>
> Maybe we should
> care about it, but you didn't try very hard to make anyone care about
> it in the cover letter.

Ok my bad.  We have customers who care very much about restricting access to
the PMEM pages to prevent bugs in the kernel from causing permanent damage to
their data/file systems.  I'll reword the cover letter better.

> 
> For a future patch-set, I'd like to see you just introduce the new
> API.  Then you can optimise the Intel implementation of it afterwards.
> Those patch-sets have entirely different reviewers.

I considered doing this.  But this seemed more logical because the feature is
being driven by PMEM which is behind the kmap interface not by the users of the
API.

I can introduce a patch set with a kmap_thread() call which does nothing if
that is more palatable but it seems wrong to me to do so.

Ira


Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-12 Thread Ira Weiny
On Mon, Oct 12, 2020 at 05:44:38PM +0100, Matthew Wilcox wrote:
> On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
> > kmap_atomic() is always preferred over kmap()/kmap_thread().
> > kmap_atomic() is _much_ more lightweight since its TLB invalidation is
> > always CPU-local and never broadcast.
> > 
> > So, basically, unless you *must* sleep while the mapping is in place,
> > kmap_atomic() is preferred.
> 
> But kmap_atomic() disables preemption, so the _ideal_ interface would map
> it only locally, then on preemption make it global.  I don't even know
> if that _can_ be done.  But this email makes it seem like kmap_atomic()
> has no downsides.

And that is IIUC what Thomas was trying to solve.

Also, Linus brought up that kmap_atomic() has quirks in nesting.[1]

>From what I can see all of these discussions support the need to have something
between kmap() and kmap_atomic().

However, the reason behind converting call sites to kmap_thread() are different
between Thomas' patch set and mine.  Both require more kmap granularity.
However, they do so with different reasons and underlying implementations but
with the _same_ resulting semantics; a thread local mapping which is
preemptable.[2]  Therefore they each focus on changing different call sites.

While this patch set is huge I think it serves a valuable purpose to identify a
large number of call sites which are candidates for this new semantic.

Ira

[1] 
https://lore.kernel.org/lkml/CAHk-=wgbmwsTOKs23Z=71ebtruloeah2u3tnqt2athewvkb...@mail.gmail.com/
[2] It is important to note these implementations are not incompatible with
each other.  So I don't see yet another 'kmap_something()' being required.


Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-11 Thread Ira Weiny
On Fri, Oct 09, 2020 at 06:30:36PM -0700, Eric Biggers wrote:
> On Sat, Oct 10, 2020 at 01:39:54AM +0100, Matthew Wilcox wrote:
> > On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
> > > On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.we...@intel.com wrote:
> > > > The kmap() calls in this FS are localized to a single thread.  To avoid
> > > > the over head of global PKRS updates use the new kmap_thread() call.
> > > >
> > > > @@ -2410,12 +2410,12 @@ static inline struct page 
> > > > *f2fs_pagecache_get_page(
> > > >  
> > > >  static inline void f2fs_copy_page(struct page *src, struct page *dst)
> > > >  {
> > > > -   char *src_kaddr = kmap(src);
> > > > -   char *dst_kaddr = kmap(dst);
> > > > +   char *src_kaddr = kmap_thread(src);
> > > > +   char *dst_kaddr = kmap_thread(dst);
> > > >  
> > > > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > > > -   kunmap(dst);
> > > > -   kunmap(src);
> > > > +   kunmap_thread(dst);
> > > > +   kunmap_thread(src);
> > > >  }
> > > 
> > > Wouldn't it make more sense to switch cases like this to kmap_atomic()?
> > > The pages are only mapped to do a memcpy(), then they're immediately 
> > > unmapped.
> > 
> > Maybe you missed the earlier thread from Thomas trying to do something
> > similar for rather different reasons ...
> > 
> > https://lore.kernel.org/lkml/20200919091751.06...@linutronix.de/
> 
> I did miss it.  I'm not subscribed to any of the mailing lists it was sent to.
> 
> Anyway, it shouldn't matter.  Patchsets should be standalone, and not require
> reading random prior threads on linux-kernel to understand.

Sorry, but I did not think that the discussion above was directly related.  If
I'm not mistaken, Thomas' work was directed at relaxing kmap_atomic() into
kmap_thread() calls.  While interesting, it is not the point of this series.  I
want to restrict kmap() callers into kmap_thread().

For this series it was considered to change the kmap_thread() call sites to
kmap_atomic().  But like I said in the cover letter kmap_atomic() is not the
same semantic.  It is too strict.  Perhaps I should have expanded that
explanation.

> 
> And I still don't really understand.  After this patchset, there is still code
> nearly identical to the above (doing a temporary mapping just for a memcpy) 
> that
> would still be using kmap_atomic().

I don't understand.  You mean there would be other call sites calling:

kmap_atomic()
memcpy()
kunmap_atomic()

?

> Is the idea that later, such code will be
> converted to use kmap_thread() instead?  If not, why use one over the other?
 

The reason for the new call is that with PKS added behind kmap we have 3 levels
of mapping we want.

global kmap (can span threads and sleep)
'thread' kmap (can sleep but not span threads)
'atomic' kmap (can't sleep nor span threads [by definition])

As Matthew said perhaps 'global kmaps' may be best changed to vmaps?  I just
don't know the details of every call site.

And since I don't know the call site details if there are kmap_thread() calls
which are better off as kmap_atomic() calls I think it is worth converting
them.  But I made the assumption that kmap users would already be calling
kmap_atomic() if they could (because it is more efficient).

Ira


Re: [PATCH RFC PKS/PMEM 57/58] nvdimm/pmem: Stray access protection for pmem->virt_addr

2020-10-11 Thread Ira Weiny
On Fri, Oct 09, 2020 at 07:53:07PM -0700, John Hubbard wrote:
> On 10/9/20 12:50 PM, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > The pmem driver uses a cached virtual address to access its memory
> > directly.  Because the nvdimm driver is well aware of the special
> > protections it has mapped memory with, we call dev_access_[en|dis]able()
> > around the direct pmem->virt_addr (pmem_addr) usage instead of the
> > unnecessary overhead of trying to get a page to kmap.
> > 
> > Signed-off-by: Ira Weiny 
> > ---
> >   drivers/nvdimm/pmem.c | 4 
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > index fab29b514372..e4dc1ae990fc 100644
> > --- a/drivers/nvdimm/pmem.c
> > +++ b/drivers/nvdimm/pmem.c
> > @@ -148,7 +148,9 @@ static blk_status_t pmem_do_read(struct pmem_device 
> > *pmem,
> > if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
> > return BLK_STS_IOERR;
> > +   dev_access_enable(false);
> > rc = read_pmem(page, page_off, pmem_addr, len);
> > +   dev_access_disable(false);
> 
> Hi Ira!
> 
> The APIs should be tweaked to use a symbol (GLOBAL, PER_THREAD), instead of
> true/false. Try reading the above and you'll see that it sounds like it's
> doing the opposite of what it is ("enable_this(false)" sounds like a clumsy
> API design to *disable*, right?). And there is no hint about the scope.

Sounds reasonable.

> 
> And it *could* be so much more readable like this:
> 
> dev_access_enable(DEV_ACCESS_THIS_THREAD);

I'll think about the flag name.  I'm not liking 'this thread'.

Maybe DEV_ACCESS_[GLOBAL|THREAD]

Ira



Re: [PATCH RFC PKS/PMEM 48/58] drivers/md: Utilize new kmap_thread()

2020-10-11 Thread Ira Weiny
On Sat, Oct 10, 2020 at 10:20:34AM +0800, Coly Li wrote:
> On 2020/10/10 03:50, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > These kmap() calls are localized to a single thread.  To avoid the over
> > head of global PKRS updates use the new kmap_thread() call.
> > 
> 
> Hi Ira,
> 
> There were a number of options considered.
> 
> 1) Attempt to change all the thread local kmap() calls to kmap_atomic()
> 2) Introduce a flags parameter to kmap() to indicate if the mapping
> should be global or not
> 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they
> require a global mapping of the pages
> 4) Change ~209 call sites to 'kmap_thread()' to indicate that the
> mapping is to be used within that thread of execution only
> 
> 
> I copied the above information from patch 00/58 to this message. The
> idea behind kmap_thread() is fine to me, but as you said the new api is
> very easy to be missed in new code (even for me). I would like to be
> supportive to option 2) introduce a flag to kmap(), then we won't forget
> the new thread-localized kmap method, and people won't ask why a
> _thread() function is called but no kthread created.

Thanks for the feedback.

I'm going to hold off making any changes until others weigh in.  FWIW, I kind
of like option 2 as well.  But there is already kmap_atomic() so it seemed like
kmap_() was more in line with the current API.

Thanks,
Ira

> 
> Thanks.
> 
> 
> Coly Li
> 


Re: [PATCH RFC PKS/PMEM 10/58] drivers/rdma: Utilize new kmap_thread()

2020-10-11 Thread Ira Weiny
On Sat, Oct 10, 2020 at 11:36:49AM +, Bernard Metzler wrote:
> -ira.we...@intel.com wrote: -
> 

[snip]

> >@@ -505,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > page_array[seg] = p;
> > 
> > if (!c_tx->use_sendpage) {
> >-iov[seg].iov_base = kmap(p) + fp_off;
> >+iov[seg].iov_base = kmap_thread(p) + 
> >fp_off;
> 
> This misses a corresponding kunmap_thread() in siw_unmap_pages()
> (pls change line 403 in siw_qp_tx.c as well)

Thanks I missed that.

Done.

Ira

> 
> Thanks,
> Bernard.
> 


Re: [PATCH RFC PKS/PMEM 09/58] drivers/gpu: Utilize new kmap_thread()

2020-10-10 Thread Ira Weiny
On Sat, Oct 10, 2020 at 12:03:49AM +0200, Daniel Vetter wrote:
> On Fri, Oct 09, 2020 at 12:49:44PM -0700, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > These kmap() calls in the gpu stack are localized to a single thread.
> > To avoid the over head of global PKRS updates use the new kmap_thread()
> > call.
> > 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Patrik Jakobsson 
> > Signed-off-by: Ira Weiny 
> 
> I'm guessing the entire pile goes in through some other tree.
>

Apologies for not realizing there were multiple maintainers here.

But, I was thinking it would land together through the mm tree once the core
support lands.  I've tried to split these out in a way they can be easily
reviewed/acked by the correct developers.

> If so:
> 
> Acked-by: Daniel Vetter 
> 
> If you want this to land through maintainer trees, then we need a
> per-driver split (since aside from amdgpu and radeon they're all different
> subtrees).

It is just RFC for the moment.  I need to get the core support accepted first
then this can land.

> 
> btw the two kmap calls in drm you highlight in the cover letter should
> also be convertible to kmap_thread. We only hold vmalloc mappings for a
> longer time (or it'd be quite a driver bug). So if you want maybe throw
> those two as two additional patches on top, and we can do some careful
> review & testing for them.

Cool.  I'll add them in.

Ira

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 12 ++--
> >  drivers/gpu/drm/gma500/gma_display.c |  4 ++--
> >  drivers/gpu/drm/gma500/mmu.c | 10 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c|  4 ++--
> >  .../gpu/drm/i915/gem/selftests/i915_gem_context.c|  4 ++--
> >  drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c   |  8 
> >  drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c |  4 ++--
> >  drivers/gpu/drm/i915/gt/intel_gtt.c  |  4 ++--
> >  drivers/gpu/drm/i915/gt/shmem_utils.c|  4 ++--
> >  drivers/gpu/drm/i915/i915_gem.c  |  8 
> >  drivers/gpu/drm/i915/i915_gpu_error.c|  4 ++--
> >  drivers/gpu/drm/i915/selftests/i915_perf.c   |  4 ++--
> >  drivers/gpu/drm/radeon/radeon_ttm.c  |  4 ++--
> >  13 files changed, 37 insertions(+), 37 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index 978bae731398..bd564bccb7a3 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -2437,11 +2437,11 @@ static ssize_t amdgpu_ttm_gtt_read(struct file *f, 
> > char __user *buf,
> >  
> > page = adev->gart.pages[p];
> > if (page) {
> > -   ptr = kmap(page);
> > +   ptr = kmap_thread(page);
> > ptr += off;
> >  
> > r = copy_to_user(buf, ptr, cur_size);
> > -   kunmap(adev->gart.pages[p]);
> > +   kunmap_thread(adev->gart.pages[p]);
> > } else
> > r = clear_user(buf, cur_size);
> >  
> > @@ -2507,9 +2507,9 @@ static ssize_t amdgpu_iomem_read(struct file *f, char 
> > __user *buf,
> > if (p->mapping != adev->mman.bdev.dev_mapping)
> > return -EPERM;
> >  
> > -   ptr = kmap(p);
> > +   ptr = kmap_thread(p);
> > r = copy_to_user(buf, ptr + off, bytes);
> > -   kunmap(p);
> > +   kunmap_thread(p);
> > if (r)
> > return -EFAULT;
> >  
> > @@ -2558,9 +2558,9 @@ static ssize_t amdgpu_iomem_write(struct file *f, 
> > const char __user *buf,
> > if (p->mapping != adev->mman.bdev.dev_mapping)
> > return -EPERM;
> >  
> > -   ptr = kmap(p);
> > +   ptr = kmap_thread(p);
> > r = copy_from_user(ptr + off, buf, bytes);
> > -   kunmap(p);
> > +   kunmap_thread(p);
> > if (r)
> > return -EFAULT;
> >  
> > diff --git a/drivers/gpu/drm/gma500/gma_display.c 
> > b/drivers/gpu/drm/gma500/gma_display.c
> > index 3df6d6e850f5..35f4e55c941f 100644
> > --- a/drivers/gpu/drm/gma500/gma_display.c
> > +++ b/drivers/gpu/drm/gma500/gma_display.c
> > @@ -400,9 +400,9 @@ int g

[PATCH RFC PKS/PMEM 53/58] lib: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Alexander Viro 
Cc: "Jérôme Glisse" 
Cc: Martin KaFai Lau 
Cc: Song Liu 
Cc: Yonghong Song 
Cc: Andrii Nakryiko 
Cc: John Fastabend 
Cc: KP Singh 
Signed-off-by: Ira Weiny 
---
 lib/iov_iter.c | 12 ++--
 lib/test_bpf.c |  4 ++--
 lib/test_hmm.c |  8 
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f12..1d47f957cf95 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -208,7 +208,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, 
size_t offset, size_t b
}
/* Too bad - revert to non-atomic kmap */
 
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
from = kaddr + offset;
left = copyout(buf, from, copy);
copy -= left;
@@ -225,7 +225,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, 
size_t offset, size_t b
from += copy;
bytes -= copy;
}
-   kunmap(page);
+   kunmap_thread(page);
 
 done:
if (skip == iov->iov_len) {
@@ -292,7 +292,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, 
size_t offset, size_t
}
/* Too bad - revert to non-atomic kmap */
 
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
to = kaddr + offset;
left = copyin(to, buf, copy);
copy -= left;
@@ -309,7 +309,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, 
size_t offset, size_t
to += copy;
bytes -= copy;
}
-   kunmap(page);
+   kunmap_thread(page);
 
 done:
if (skip == iov->iov_len) {
@@ -1742,10 +1742,10 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t 
bytes,
return 0;
 
iterate_all_kinds(i, bytes, v, -EINVAL, ({
-   w.iov_base = kmap(v.bv_page) + v.bv_offset;
+   w.iov_base = kmap_thread(v.bv_page) + v.bv_offset;
w.iov_len = v.bv_len;
err = f(&w, context);
-   kunmap(v.bv_page);
+   kunmap_thread(v.bv_page);
err;}), ({
w = v;
err = f(&w, context);})
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index ca7d635bccd9..441f822f56ba 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -6506,11 +6506,11 @@ static void *generate_test_data(struct bpf_test *test, 
int sub)
if (!page)
goto err_kfree_skb;
 
-   ptr = kmap(page);
+   ptr = kmap_thread(page);
if (!ptr)
goto err_free_page;
memcpy(ptr, test->frag_data, MAX_DATA);
-   kunmap(page);
+   kunmap_thread(page);
skb_add_rx_frag(skb, 0, page, 0, MAX_DATA, MAX_DATA);
}
 
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index e7dc3de355b7..e40d26f97f45 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -329,9 +329,9 @@ static int dmirror_do_read(struct dmirror *dmirror, 
unsigned long start,
if (!page)
return -ENOENT;
 
-   tmp = kmap(page);
+   tmp = kmap_thread(page);
memcpy(ptr, tmp, PAGE_SIZE);
-   kunmap(page);
+   kunmap_thread(page);
 
ptr += PAGE_SIZE;
bounce->cpages++;
@@ -398,9 +398,9 @@ static int dmirror_do_write(struct dmirror *dmirror, 
unsigned long start,
if (!page || xa_pointer_tag(entry) != DPT_XA_TAG_WRITE)
return -ENOENT;
 
-   tmp = kmap(page);
+   tmp = kmap_thread(page);
memcpy(tmp, ptr, PAGE_SIZE);
-   kunmap(page);
+   kunmap_thread(page);
 
ptr += PAGE_SIZE;
bounce->cpages++;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 40/58] net: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls in these drivers are localized to a single thread.
To avoid the over head of global PKRS updates use the new kmap_thread()
call.

Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Alexey Kuznetsov 
Cc: Hideaki YOSHIFUJI 
Cc: Trond Myklebust 
Cc: Anna Schumaker 
Cc: Boris Pismenny 
Cc: Aviad Yehezkel 
Cc: John Fastabend 
Cc: Daniel Borkmann 
Signed-off-by: Ira Weiny 
---
 net/ceph/messenger.c | 4 ++--
 net/core/datagram.c  | 4 ++--
 net/core/sock.c  | 8 
 net/ipv4/ip_output.c | 4 ++--
 net/sunrpc/cache.c   | 4 ++--
 net/sunrpc/xdr.c | 8 
 net/tls/tls_device.c | 4 ++--
 7 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index d4d7a0e52491..0c49b8e333da 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1535,10 +1535,10 @@ static u32 ceph_crc32c_page(u32 crc, struct page *page,
 {
char *kaddr;
 
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
BUG_ON(kaddr == NULL);
crc = crc32c(crc, kaddr + page_offset, length);
-   kunmap(page);
+   kunmap_thread(page);
 
return crc;
 }
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 639745d4f3b9..cbd0a343074a 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -441,14 +441,14 @@ static int __skb_datagram_iter(const struct sk_buff *skb, 
int offset,
end = start + skb_frag_size(frag);
if ((copy = end - offset) > 0) {
struct page *page = skb_frag_page(frag);
-   u8 *vaddr = kmap(page);
+   u8 *vaddr = kmap_thread(page);
 
if (copy > len)
copy = len;
n = INDIRECT_CALL_1(cb, simple_copy_to_iter,
vaddr + skb_frag_off(frag) + offset - 
start,
copy, data, to);
-   kunmap(page);
+   kunmap_thread(page);
offset += n;
if (n != copy)
goto short_copy;
diff --git a/net/core/sock.c b/net/core/sock.c
index 6c5c6b18eff4..9b46a75cd8c1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2846,11 +2846,11 @@ ssize_t sock_no_sendpage(struct socket *sock, struct 
page *page, int offset, siz
ssize_t res;
struct msghdr msg = {.msg_flags = flags};
struct kvec iov;
-   char *kaddr = kmap(page);
+   char *kaddr = kmap_thread(page);
iov.iov_base = kaddr + offset;
iov.iov_len = size;
res = kernel_sendmsg(sock, &msg, &iov, 1, size);
-   kunmap(page);
+   kunmap_thread(page);
return res;
 }
 EXPORT_SYMBOL(sock_no_sendpage);
@@ -2861,12 +2861,12 @@ ssize_t sock_no_sendpage_locked(struct sock *sk, struct 
page *page,
ssize_t res;
struct msghdr msg = {.msg_flags = flags};
struct kvec iov;
-   char *kaddr = kmap(page);
+   char *kaddr = kmap_thread(page);
 
iov.iov_base = kaddr + offset;
iov.iov_len = size;
res = kernel_sendmsg_locked(sk, &msg, &iov, 1, size);
-   kunmap(page);
+   kunmap_thread(page);
return res;
 }
 EXPORT_SYMBOL(sock_no_sendpage_locked);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e6f2ada9e7d5..05304fb251a4 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -949,9 +949,9 @@ csum_page(struct page *page, int offset, int copy)
 {
char *kaddr;
__wsum csum;
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
csum = csum_partial(kaddr + offset, copy, 0);
-   kunmap(page);
+   kunmap_thread(page);
return csum;
 }
 
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index baef5ee43dbb..88193f2a8e6f 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -935,9 +935,9 @@ static ssize_t cache_downcall(struct address_space *mapping,
if (!page)
goto out_slow;
 
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
ret = cache_do_downcall(kaddr, buf, count, cd);
-   kunmap(page);
+   kunmap_thread(page);
unlock_page(page);
put_page(page);
return ret;
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index be11d672b5b9..00afbb48fb0a 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1353,7 +1353,7 @@ xdr_xcode_array2(struct xdr_buf *buf, unsigned int base,
base &= ~PAGE_MASK;
avail_page = min_t(unsigned int, PAGE_SIZE - base,
avail_here);
-   c = kmap(*ppages) + base;
+   c = kmap_thread(*ppages) + base;
 
while (avail_here) {
avail_here -= avail_page;
@@ -1429,9 +1429,9 @@ xdr_xcode_array2(st

[PATCH RFC PKS/PMEM 10/58] drivers/rdma: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in these drivers are localized to a single thread.  To
avoid the over head of global PKRS updates use the new kmap_thread()
call.

Cc: Mike Marciniszyn 
Cc: Dennis Dalessandro 
Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: Faisal Latif 
Cc: Shiraz Saleem 
Cc: Bernard Metzler 
Signed-off-by: Ira Weiny 
---
 drivers/infiniband/hw/hfi1/sdma.c  |  4 ++--
 drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +-
 drivers/infiniband/sw/siw/siw_qp_tx.c  | 14 +++---
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/sdma.c 
b/drivers/infiniband/hw/hfi1/sdma.c
index 04575c9afd61..09d206e3229a 100644
--- a/drivers/infiniband/hw/hfi1/sdma.c
+++ b/drivers/infiniband/hw/hfi1/sdma.c
@@ -3130,7 +3130,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata *dd, 
struct sdma_txreq *tx,
}
 
if (type == SDMA_MAP_PAGE) {
-   kvaddr = kmap(page);
+   kvaddr = kmap_thread(page);
kvaddr += offset;
} else if (WARN_ON(!kvaddr)) {
__sdma_txclean(dd, tx);
@@ -3140,7 +3140,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata *dd, 
struct sdma_txreq *tx,
memcpy(tx->coalesce_buf + tx->coalesce_idx, kvaddr, len);
tx->coalesce_idx += len;
if (type == SDMA_MAP_PAGE)
-   kunmap(page);
+   kunmap_thread(page);
 
/* If there is more data, return */
if (tx->tlen - tx->coalesce_idx)
diff --git a/drivers/infiniband/hw/i40iw/i40iw_cm.c 
b/drivers/infiniband/hw/i40iw/i40iw_cm.c
index a3b95805c154..122d7a5642a1 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_cm.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_cm.c
@@ -3721,7 +3721,7 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
ibmr->device = iwpd->ibpd.device;
iwqp->lsmm_mr = ibmr;
if (iwqp->page)
-   iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
+   iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp,
iwqp->ietf_mem.va,
(accept.size + 
conn_param->private_data_len),
@@ -3729,12 +3729,12 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
 
} else {
if (iwqp->page)
-   iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
+   iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp, NULL, 0, 0);
}
 
if (iwqp->page)
-   kunmap(iwqp->page);
+   kunmap_thread(iwqp->page);
 
iwqp->cm_id = cm_id;
cm_node->cm_id = cm_id;
@@ -4102,10 +4102,10 @@ static void i40iw_cm_event_connected(struct 
i40iw_cm_event *event)
i40iw_cm_init_tsa_conn(iwqp, cm_node);
read0 = (cm_node->send_rdma0_op == SEND_RDMA_READ_ZERO);
if (iwqp->page)
-   iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
+   iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
dev->iw_priv_qp_ops->qp_send_rtt(&iwqp->sc_qp, read0);
if (iwqp->page)
-   kunmap(iwqp->page);
+   kunmap_thread(iwqp->page);
 
memset(&attr, 0, sizeof(attr));
attr.qp_state = IB_QPS_RTS;
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c 
b/drivers/infiniband/sw/siw/siw_qp_tx.c
index d19d8325588b..4ed37c328d02 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -76,7 +76,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void 
*paddr)
if (unlikely(!p))
return -EFAULT;
 
-   buffer = kmap(p);
+   buffer = kmap_thread(p);
 
if (likely(PAGE_SIZE - off >= bytes)) {
memcpy(paddr, buffer + off, bytes);
@@ -84,7 +84,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void 
*paddr)
unsigned long part = bytes - (PAGE_SIZE - off);
 
memcpy(paddr, buffer + off, part);
-   kunmap(p);
+   kunmap_thread(p);
 
if (!mem->is_pbl)
p = siw_get_upage(mem->umem,
@@ -96,10 +96,10 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void 
*paddr)
if (unlikely(!p))
return -E

[PATCH RFC PKS/PMEM 06/58] kmap: Introduce k[un]map_thread debugging

2020-10-09 Thread ira . weiny
From: Ira Weiny 

Most kmap() callers use the map within a single thread and have no need
for the protection domain to be enabled globally.

To differentiate these kmap users, new k[un]map_thread() calls were
introduced which are thread local.

To aid in debugging the new use of kmap_thread(), add a reference count,
a check on that count, and tracing to ID where mapping errors occur.

Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Signed-off-by: Ira Weiny 
---
 include/linux/highmem.h|  5 +++
 include/linux/sched.h  |  5 +++
 include/trace/events/kmap_thread.h | 56 ++
 init/init_task.c   |  3 ++
 kernel/fork.c  | 15 
 lib/Kconfig.debug  |  8 +
 mm/debug.c | 23 
 7 files changed, 115 insertions(+)
 create mode 100644 include/trace/events/kmap_thread.h

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef7813544719..22d1c000802e 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -247,6 +247,10 @@ static inline void kunmap(struct page *page)
__kunmap(page, true);
 }
 
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+void *kmap_thread(struct page *page);
+void kunmap_thread(struct page *page);
+#else
 static inline void *kmap_thread(struct page *page)
 {
return __kmap(page, false);
@@ -255,6 +259,7 @@ static inline void kunmap_thread(struct page *page)
 {
__kunmap(page, false);
 }
+#endif
 
 /*
  * Prevent people trying to call kunmap_atomic() as if it were kunmap()
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 25d97ab6c757..4627ea4a49e6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1318,6 +1318,11 @@ struct task_struct {
 #ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
unsigned intdev_page_access_ref;
 #endif
+
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+   unsigned intkmap_thread_cnt;
+#endif
+
/*
 * New fields for task_struct should be added above here, so that
 * they are included in the randomized portion of task_struct.
diff --git a/include/trace/events/kmap_thread.h 
b/include/trace/events/kmap_thread.h
new file mode 100644
index ..e7143cfe0daf
--- /dev/null
+++ b/include/trace/events/kmap_thread.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+
+/*
+ * Copyright (c) 2020 Intel Corporation.  All rights reserved.
+ *
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kmap_thread
+
+#if !defined(_TRACE_KMAP_THREAD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KMAP_THREAD_H
+
+#include 
+
+DECLARE_EVENT_CLASS(kmap_thread_template,
+   TP_PROTO(struct task_struct *tsk, struct page *page,
+void *caller_addr, int cnt),
+   TP_ARGS(tsk, page, caller_addr, cnt),
+
+   TP_STRUCT__entry(
+   __field(int, pid)
+   __field(struct page *, page)
+   __field(void *, caller_addr)
+   __field(int, cnt)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->page = page;
+   __entry->caller_addr = caller_addr;
+   __entry->cnt = cnt;
+   ),
+
+   TP_printk("PID %d; (%d) %pS %p",
+   __entry->pid,
+   __entry->cnt,
+   __entry->caller_addr,
+   __entry->page
+   )
+);
+
+DEFINE_EVENT(kmap_thread_template, kmap_thread,
+   TP_PROTO(struct task_struct *tsk, struct page *page,
+void *caller_addr, int cnt),
+   TP_ARGS(tsk, page, caller_addr, cnt));
+
+DEFINE_EVENT(kmap_thread_template, kunmap_thread,
+   TP_PROTO(struct task_struct *tsk, struct page *page,
+void *caller_addr, int cnt),
+   TP_ARGS(tsk, page, caller_addr, cnt));
+
+
+#endif /* _TRACE_KMAP_THREAD_H */
+
+#include 
diff --git a/init/init_task.c b/init/init_task.c
index 9b39f25de59b..19f09965eb34 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -212,6 +212,9 @@ struct task_struct init_task
 #ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
.dev_page_access_ref = 0,
 #endif
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+   .kmap_thread_cnt = 0,
+#endif
 };
 EXPORT_SYMBOL(init_task);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index b6a3ee328a89..2c66e49b7614 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -722,6 +722,17 @@ static inline void put_signal_struct(struct signal_struct 
*sig)
free_signal_struct(sig);
 }
 
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+static void check_outstanding_kmap_thread(struct task_struct *tsk)
+{
+   if (tsk->kmap_thread_cnt)
+   pr_warn(KERN_ERR "WARNING: PID %d; Failed to kunmap_thread() 
[cnt %d]\n",
+   tsk->pid, tsk->kmap_thread_cnt);
+}
+#else
+static void check_outs

[PATCH RFC PKS/PMEM 03/58] memremap: Add zone device access protection

2020-10-09 Thread ira . weiny
From: Ira Weiny 

Device managed memory exposes itself to the kernel direct map which
allows stray pointers to access these device memories.

Stray pointers to normal memory may result in a crash or other
undesirable behavior which, while unfortunate, are usually recoverable
with a reboot.  Stray access, specifically stray writes, to areas such
as non-volatile memory are permanent in nature and thus are more likely
to result in permanent user data loss vs stray access to other memory
areas.

Furthermore, we protect against reads which can help with speculative
reads to poison areas as well.  But this is a secondary reason.

Set up an infrastructure for extra device access protection. Then
implement the new protection using the new Protection Keys Supervisor
(PKS) on architectures which support it.

To enable this extra protection devices specify a flag in the pgmap to
indicate that these areas wish to use additional protection.

Kernel code which intends to access this memory can do so automatically
through the use of the kmap infrastructure calling into
dev_access_[enable|disable]() described here.  The kmap infrastructure
is implemented in a follow on patch.

In addition, users can directly enable/disable the access through
dev_access_[enable|disable]() if they have a priori knowledge of the
type of pages they are accessing.

All calls to enable/disable protection flow through
dev_access_[enable|disable]() and are nestable by the use of a per task
reference count.  This reference count does 2 things.

1) Allows a thread to nest calls to disable protection such that the
   first call to re-enable protection does not 'break' the last access of
   the pmem device memory.

2) Provides faster performance by avoiding lots of MSR writes.  For
   example, looping over a sequence of pmem pages.

In addition, we must ensure the reference count is preserved through an
exception so we add the count to irqentry_state_t and save/restore the
reference count while giving exceptions their own count should they use
a kmap call.

The following shows how this works through an exception:

...
// ref == 0
dev_access_enable()  // ref += 1 ==> disable protection
irq()
// enable protection
// ref = 0
_handler()
dev_access_enable()   // ref += 1 ==> 
disable protection
dev_access_disable()  // ref -= 1 ==> 
enable protection
// WARN_ON(ref != 0)
// disable protection
do_pmem_thing()  // all good here
dev_access_disable() // ref -= 1 ==> 0 ==> enable protection
...

Nested exceptions operate the same way with each exception storing the
interrupted exception state all the way down.

The pkey value is never free'ed as this optimizes the implementation to
be either on or off using a static branch conditional in the fast paths.

Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Signed-off-by: Ira Weiny 
---
 arch/x86/entry/common.c  | 21 +
 include/linux/entry-common.h |  3 ++
 include/linux/memremap.h |  1 +
 include/linux/mm.h   | 43 +
 include/linux/sched.h|  3 ++
 init/init_task.c |  3 ++
 kernel/fork.c|  3 ++
 mm/Kconfig   | 13 ++
 mm/memremap.c| 90 
 9 files changed, 180 insertions(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 86ad32e0095e..3680724c1a4d 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -264,12 +264,27 @@ noinstr void idtentry_exit_nmi(struct pt_regs *regs, 
irqentry_state_t *irq_state
  *
  * NOTE That the thread saved PKRS must be preserved separately to ensure
  * global overrides do not 'stick' on a thread.
+ *
+ * Furthermore, Zone Device Access Protection maintains access in a re-entrant
+ * manner through a reference count which also needs to be maintained should
+ * exception handlers use those interfaces for memory access.  Here we start
+ * off the exception handler ref count to 0 and ensure it is 0 when the
+ * exception is done.  Then restore it for the interrupted task.
  */
 noinstr void irq_save_pkrs(irqentry_state_t *state)
 {
if (!cpu_feature_enabled(X86_FEATURE_PKS))
return;
 
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+   /*
+* Save the ref count of the current running process and set it to 0
+* for any irq users to properly track re-entrance
+*/
+   state->pkrs_ref = current->dev_page_access_ref;
+   current->dev_page_access_ref = 0;
+#endif
+
/*
 * The thread_pkrs must be maintained separately to prevent global
 * overrid

[PATCH RFC PKS/PMEM 58/58] [dax|pmem]: Enable stray access protection

2020-10-09 Thread ira . weiny
From: Ira Weiny 

Protecting against stray writes is particularly important for PMEM
because, unlike writes to anonymous memory, writes to PMEM persists
across a reboot.  Thus data corruption could result in permanent loss of
data.

While stray writes are more serious than reads, protection is also
enabled for reads.  This helps to detect bugs in code which would
incorrectly access device memory and prevents a more serious machine
checks should those bug reads from a poison page.

Enable stray access protection by setting the flag in pgmap which
requests it.  There is no option presented to the user.  If Zone Device
Access Protection not be supported this flag will have no affect.

Signed-off-by: Ira Weiny 
---
 drivers/dax/device.c  | 2 ++
 drivers/nvdimm/pmem.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 1e89513f3c59..e6fb35b4f0fb 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -430,6 +430,8 @@ int dev_dax_probe(struct device *dev)
}
 
dev_dax->pgmap.type = MEMORY_DEVICE_GENERIC;
+   dev_dax->pgmap.flags |= PGMAP_PROT_ENABLED;
+
addr = devm_memremap_pages(dev, &dev_dax->pgmap);
if (IS_ERR(addr))
return PTR_ERR(addr);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index e4dc1ae990fc..9fcd8338e23f 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -426,6 +426,8 @@ static int pmem_attach_disk(struct device *dev,
return -EBUSY;
}
 
+   pmem->pgmap.flags |= PGMAP_PROT_ENABLED;
+
q = blk_alloc_queue(dev_to_node(dev));
if (!q)
return -ENOMEM;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 57/58] nvdimm/pmem: Stray access protection for pmem->virt_addr

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The pmem driver uses a cached virtual address to access its memory
directly.  Because the nvdimm driver is well aware of the special
protections it has mapped memory with, we call dev_access_[en|dis]able()
around the direct pmem->virt_addr (pmem_addr) usage instead of the
unnecessary overhead of trying to get a page to kmap.

Signed-off-by: Ira Weiny 
---
 drivers/nvdimm/pmem.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fab29b514372..e4dc1ae990fc 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -148,7 +148,9 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
return BLK_STS_IOERR;
 
+   dev_access_enable(false);
rc = read_pmem(page, page_off, pmem_addr, len);
+   dev_access_disable(false);
flush_dcache_page(page);
return rc;
 }
@@ -180,11 +182,13 @@ static blk_status_t pmem_do_write(struct pmem_device 
*pmem,
 * after clear poison.
 */
flush_dcache_page(page);
+   dev_access_enable(false);
write_pmem(pmem_addr, page, page_off, len);
if (unlikely(bad_pmem)) {
rc = pmem_clear_poison(pmem, pmem_off, len);
write_pmem(pmem_addr, page, page_off, len);
}
+   dev_access_disable(false);
 
return rc;
 }
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 56/58] dax: Stray access protection for dax_direct_access()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

dax_direct_access() is a special case of accessing pmem via a page
offset and without a struct page.

Because the dax driver is well aware of the special protections it has
mapped memory with, call dev_access_[en|dis]able() directly instead of
the unnecessary overhead of trying to get a page to kmap.

Similar to kmap, we leverage existing functions, dax_read_[un]lock(),
because they are already required to surround the use of the memory
returned from dax_direct_access().

Signed-off-by: Ira Weiny 
---
 drivers/dax/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e84070b55463..0ddb3ee73e36 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -30,6 +30,7 @@ static DEFINE_SPINLOCK(dax_host_lock);
 
 int dax_read_lock(void)
 {
+   dev_access_enable(false);
return srcu_read_lock(&dax_srcu);
 }
 EXPORT_SYMBOL_GPL(dax_read_lock);
@@ -37,6 +38,7 @@ EXPORT_SYMBOL_GPL(dax_read_lock);
 void dax_read_unlock(int id)
 {
srcu_read_unlock(&dax_srcu, id);
+   dev_access_disable(false);
 }
 EXPORT_SYMBOL_GPL(dax_read_unlock);
 
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 55/58] samples: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Kirti Wankhede 
Signed-off-by: Ira Weiny 
---
 samples/vfio-mdev/mbochs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
index 3cc5e5921682..6d95422c0b46 100644
--- a/samples/vfio-mdev/mbochs.c
+++ b/samples/vfio-mdev/mbochs.c
@@ -479,12 +479,12 @@ static ssize_t mdev_access(struct mdev_device *mdev, char 
*buf, size_t count,
pos -= MBOCHS_MMIO_BAR_OFFSET;
poff = pos & ~PAGE_MASK;
pg = __mbochs_get_page(mdev_state, pos >> PAGE_SHIFT);
-   map = kmap(pg);
+   map = kmap_thread(pg);
if (is_write)
memcpy(map + poff, buf, count);
else
memcpy(buf, map + poff, count);
-   kunmap(pg);
+   kunmap_thread(pg);
put_page(pg);
 
} else {
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 54/58] powerpc: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Signed-off-by: Ira Weiny 
---
 arch/powerpc/mm/mem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 42e25874f5a8..6ef557b8dda6 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -573,9 +573,9 @@ void flush_icache_user_page(struct vm_area_struct *vma, 
struct page *page,
 {
unsigned long maddr;
 
-   maddr = (unsigned long) kmap(page) + (addr & ~PAGE_MASK);
+   maddr = (unsigned long) kmap_thread(page) + (addr & ~PAGE_MASK);
flush_icache_range(maddr, maddr + len);
-   kunmap(page);
+   kunmap_thread(page);
 }
 
 /*
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 52/58] mm: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Signed-off-by: Ira Weiny 
---
 mm/memory.c  | 8 
 mm/swapfile.c| 4 ++--
 mm/userfaultfd.c | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..75a054882d7a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4945,7 +4945,7 @@ int __access_remote_vm(struct task_struct *tsk, struct 
mm_struct *mm,
if (bytes > PAGE_SIZE-offset)
bytes = PAGE_SIZE-offset;
 
-   maddr = kmap(page);
+   maddr = kmap_thread(page);
if (write) {
copy_to_user_page(vma, page, addr,
  maddr + offset, buf, bytes);
@@ -4954,7 +4954,7 @@ int __access_remote_vm(struct task_struct *tsk, struct 
mm_struct *mm,
copy_from_user_page(vma, page, addr,
buf, maddr + offset, bytes);
}
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
}
len -= bytes;
@@ -5216,14 +5216,14 @@ long copy_huge_page_from_user(struct page *dst_page,
 
for (i = 0; i < pages_per_huge_page; i++) {
if (allow_pagefault)
-   page_kaddr = kmap(dst_page + i);
+   page_kaddr = kmap_thread(dst_page + i);
else
page_kaddr = kmap_atomic(dst_page + i);
rc = copy_from_user(page_kaddr,
(const void __user *)(src + i * PAGE_SIZE),
PAGE_SIZE);
if (allow_pagefault)
-   kunmap(dst_page + i);
+   kunmap_thread(dst_page + i);
else
kunmap_atomic(page_kaddr);
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index debc94155f74..e3296ff95648 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3219,7 +3219,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, 
int, swap_flags)
error = PTR_ERR(page);
goto bad_swap_unlock_inode;
}
-   swap_header = kmap(page);
+   swap_header = kmap_thread(page);
 
maxpages = read_swap_header(p, swap_header, inode);
if (unlikely(!maxpages)) {
@@ -3395,7 +3395,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, 
int, swap_flags)
filp_close(swap_file, NULL);
 out:
if (page && !IS_ERR(page)) {
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
}
if (name)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 9a3d451402d7..4d38c881bb2d 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -586,11 +586,11 @@ static __always_inline ssize_t __mcopy_atomic(struct 
mm_struct *dst_mm,
mmap_read_unlock(dst_mm);
BUG_ON(!page);
 
-   page_kaddr = kmap(page);
+   page_kaddr = kmap_thread(page);
err = copy_from_user(page_kaddr,
 (const void __user *) src_addr,
 PAGE_SIZE);
-   kunmap(page);
+   kunmap_thread(page);
if (unlikely(err)) {
err = -EFAULT;
goto out;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 51/58] kernel: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

This kmap() call is localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Eric Biederman 
Signed-off-by: Ira Weiny 
---
 kernel/kexec_core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0dad1ebe..272a9920c0d6 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -815,7 +815,7 @@ static int kimage_load_normal_segment(struct kimage *image,
if (result < 0)
goto out;
 
-   ptr = kmap(page);
+   ptr = kmap_thread(page);
/* Start with a clear page */
clear_page(ptr);
ptr += maddr & ~PAGE_MASK;
@@ -828,7 +828,7 @@ static int kimage_load_normal_segment(struct kimage *image,
memcpy(ptr, kbuf, uchunk);
else
result = copy_from_user(ptr, buf, uchunk);
-   kunmap(page);
+   kunmap_thread(page);
if (result) {
result = -EFAULT;
goto out;
@@ -879,7 +879,7 @@ static int kimage_load_crash_segment(struct kimage *image,
goto out;
}
arch_kexec_post_alloc_pages(page_address(page), 1, 0);
-   ptr = kmap(page);
+   ptr = kmap_thread(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
PAGE_SIZE - (maddr & ~PAGE_MASK));
@@ -895,7 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
else
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 50/58] drivers/android: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Greg Kroah-Hartman 
Signed-off-by: Ira Weiny 
---
 drivers/android/binder_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 69609696a843..5f50856caad7 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1118,9 +1118,9 @@ binder_alloc_copy_user_to_buffer(struct binder_alloc 
*alloc,
page = binder_alloc_get_page(alloc, buffer,
 buffer_offset, &pgoff);
size = min_t(size_t, bytes, PAGE_SIZE - pgoff);
-   kptr = kmap(page) + pgoff;
+   kptr = kmap_thread(page) + pgoff;
ret = copy_from_user(kptr, from, size);
-   kunmap(page);
+   kunmap_thread(page);
if (ret)
return bytes - size + ret;
bytes -= size;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 49/58] drivers/misc: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Greg Kroah-Hartman 
Signed-off-by: Ira Weiny 
---
 drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c 
b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index 8531ae781195..f308abb8ad03 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -343,7 +343,7 @@ static int qp_memcpy_to_queue_iter(struct vmci_queue *queue,
size_t to_copy;
 
if (kernel_if->host)
-   va = kmap(kernel_if->u.h.page[page_index]);
+   va = kmap_thread(kernel_if->u.h.page[page_index]);
else
va = kernel_if->u.g.vas[page_index + 1];
/* Skip header. */
@@ -357,12 +357,12 @@ static int qp_memcpy_to_queue_iter(struct vmci_queue 
*queue,
if (!copy_from_iter_full((u8 *)va + page_offset, to_copy,
 from)) {
if (kernel_if->host)
-   kunmap(kernel_if->u.h.page[page_index]);
+   kunmap_thread(kernel_if->u.h.page[page_index]);
return VMCI_ERROR_INVALID_ARGS;
}
bytes_copied += to_copy;
if (kernel_if->host)
-   kunmap(kernel_if->u.h.page[page_index]);
+   kunmap_thread(kernel_if->u.h.page[page_index]);
}
 
return VMCI_SUCCESS;
@@ -391,7 +391,7 @@ static int qp_memcpy_from_queue_iter(struct iov_iter *to,
int err;
 
if (kernel_if->host)
-   va = kmap(kernel_if->u.h.page[page_index]);
+   va = kmap_thread(kernel_if->u.h.page[page_index]);
else
va = kernel_if->u.g.vas[page_index + 1];
/* Skip header. */
@@ -405,12 +405,12 @@ static int qp_memcpy_from_queue_iter(struct iov_iter *to,
err = copy_to_iter((u8 *)va + page_offset, to_copy, to);
if (err != to_copy) {
if (kernel_if->host)
-   kunmap(kernel_if->u.h.page[page_index]);
+   kunmap_thread(kernel_if->u.h.page[page_index]);
return VMCI_ERROR_INVALID_ARGS;
}
bytes_copied += to_copy;
if (kernel_if->host)
-   kunmap(kernel_if->u.h.page[page_index]);
+   kunmap_thread(kernel_if->u.h.page[page_index]);
}
 
return VMCI_SUCCESS;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 48/58] drivers/md: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Coly Li  (maintainer:BCACHE (BLOCK LAYER CACHE))
Cc: Kent Overstreet  (maintainer:BCACHE (BLOCK LAYER 
CACHE))
Signed-off-by: Ira Weiny 
---
 drivers/md/bcache/request.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index c7cadaafa947..a4571f6d09dd 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -44,10 +44,10 @@ static void bio_csum(struct bio *bio, struct bkey *k)
uint64_t csum = 0;
 
bio_for_each_segment(bv, bio, iter) {
-   void *d = kmap(bv.bv_page) + bv.bv_offset;
+   void *d = kmap_thread(bv.bv_page) + bv.bv_offset;
 
csum = bch_crc64_update(csum, d, bv.bv_len);
-   kunmap(bv.bv_page);
+   kunmap_thread(bv.bv_page);
}
 
k->ptr[KEY_PTRS(k)] = csum & (~0ULL >> 1);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 47/58] drivers/mtd: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Miquel Raynal 
Cc: Richard Weinberger 
Cc: Vignesh Raghavendra 
Signed-off-by: Ira Weiny 
---
 drivers/mtd/mtd_blkdevs.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 0c05f77f9b21..4b18998273fa 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -88,14 +88,14 @@ static blk_status_t do_blktrans_request(struct 
mtd_blktrans_ops *tr,
return BLK_STS_IOERR;
return BLK_STS_OK;
case REQ_OP_READ:
-   buf = kmap(bio_page(req->bio)) + bio_offset(req->bio);
+   buf = kmap_thread(bio_page(req->bio)) + bio_offset(req->bio);
for (; nsect > 0; nsect--, block++, buf += tr->blksize) {
if (tr->readsect(dev, block, buf)) {
-   kunmap(bio_page(req->bio));
+   kunmap_thread(bio_page(req->bio));
return BLK_STS_IOERR;
}
}
-   kunmap(bio_page(req->bio));
+   kunmap_thread(bio_page(req->bio));
rq_flush_dcache_pages(req);
return BLK_STS_OK;
case REQ_OP_WRITE:
@@ -103,14 +103,14 @@ static blk_status_t do_blktrans_request(struct 
mtd_blktrans_ops *tr,
return BLK_STS_IOERR;
 
rq_flush_dcache_pages(req);
-   buf = kmap(bio_page(req->bio)) + bio_offset(req->bio);
+   buf = kmap_thread(bio_page(req->bio)) + bio_offset(req->bio);
for (; nsect > 0; nsect--, block++, buf += tr->blksize) {
if (tr->writesect(dev, block, buf)) {
-   kunmap(bio_page(req->bio));
+   kunmap_thread(bio_page(req->bio));
return BLK_STS_IOERR;
}
}
-   kunmap(bio_page(req->bio));
+   kunmap_thread(bio_page(req->bio));
return BLK_STS_OK;
default:
return BLK_STS_IOERR;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 46/58] drives/staging: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Greg Kroah-Hartman 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rts5208/rtsx_transport.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rts5208/rtsx_transport.c 
b/drivers/staging/rts5208/rtsx_transport.c
index 0027bcf638ad..f747cc23951b 100644
--- a/drivers/staging/rts5208/rtsx_transport.c
+++ b/drivers/staging/rts5208/rtsx_transport.c
@@ -92,13 +92,13 @@ unsigned int rtsx_stor_access_xfer_buf(unsigned char 
*buffer,
while (sglen > 0) {
unsigned int plen = min(sglen, (unsigned int)
PAGE_SIZE - poff);
-   unsigned char *ptr = kmap(page);
+   unsigned char *ptr = kmap_thread(page);
 
if (dir == TO_XFER_BUF)
memcpy(ptr + poff, buffer + cnt, plen);
else
memcpy(buffer + cnt, ptr + poff, plen);
-   kunmap(page);
+   kunmap_thread(page);
 
/* Start at the beginning of the next page */
poff = 0;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 45/58] drivers/firmware: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Ard Biesheuvel 
Signed-off-by: Ira Weiny 
---
 drivers/firmware/efi/capsule-loader.c | 6 +++---
 drivers/firmware/efi/capsule.c| 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/firmware/efi/capsule-loader.c 
b/drivers/firmware/efi/capsule-loader.c
index 4dde8edd53b6..aa2e0b5940fd 100644
--- a/drivers/firmware/efi/capsule-loader.c
+++ b/drivers/firmware/efi/capsule-loader.c
@@ -197,7 +197,7 @@ static ssize_t efi_capsule_write(struct file *file, const 
char __user *buff,
page = cap_info->pages[cap_info->index - 1];
}
 
-   kbuff = kmap(page);
+   kbuff = kmap_thread(page);
kbuff += PAGE_SIZE - cap_info->page_bytes_remain;
 
/* Copy capsule binary data from user space to kernel space buffer */
@@ -217,7 +217,7 @@ static ssize_t efi_capsule_write(struct file *file, const 
char __user *buff,
}
 
cap_info->count += write_byte;
-   kunmap(page);
+   kunmap_thread(page);
 
/* Submit the full binary to efi_capsule_update() API */
if (cap_info->header.headersize > 0 &&
@@ -236,7 +236,7 @@ static ssize_t efi_capsule_write(struct file *file, const 
char __user *buff,
return write_byte;
 
 fail_unmap:
-   kunmap(page);
+   kunmap_thread(page);
 failed:
efi_free_all_buff_pages(cap_info);
return ret;
diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c
index 598b7800d14e..edb7797b0e4f 100644
--- a/drivers/firmware/efi/capsule.c
+++ b/drivers/firmware/efi/capsule.c
@@ -244,7 +244,7 @@ int efi_capsule_update(efi_capsule_header_t *capsule, 
phys_addr_t *pages)
for (i = 0; i < sg_count; i++) {
efi_capsule_block_desc_t *sglist;
 
-   sglist = kmap(sg_pages[i]);
+   sglist = kmap_thread(sg_pages[i]);
 
for (j = 0; j < SGLIST_PER_PAGE && count > 0; j++) {
u64 sz = min_t(u64, imagesize,
@@ -265,7 +265,7 @@ int efi_capsule_update(efi_capsule_header_t *capsule, 
phys_addr_t *pages)
else
sglist[j].data = page_to_phys(sg_pages[i + 1]);
 
-   kunmap(sg_pages[i]);
+   kunmap_thread(sg_pages[i]);
}
 
mutex_lock(&capsule_mutex);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 44/58] drivers/xen: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Stefano Stabellini 
Signed-off-by: Ira Weiny 
---
 drivers/xen/gntalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index 3fa40c723e8e..3b78e055feff 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -184,9 +184,9 @@ static int add_grefs(struct ioctl_gntalloc_alloc_gref *op,
 static void __del_gref(struct gntalloc_gref *gref)
 {
if (gref->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) {
-   uint8_t *tmp = kmap(gref->page);
+   uint8_t *tmp = kmap_thread(gref->page);
tmp[gref->notify.pgoff] = 0;
-   kunmap(gref->page);
+   kunmap_thread(gref->page);
}
if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) {
notify_remote_via_evtchn(gref->notify.event);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 42/58] drivers/scsi: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Signed-off-by: Ira Weiny 
---
 drivers/scsi/ipr.c | 8 
 drivers/scsi/pmcraid.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index b0aa58d117cc..a5a0b8feb661 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -3923,9 +3923,9 @@ static int ipr_copy_ucode_buffer(struct ipr_sglist 
*sglist,
buffer += bsize_elem) {
struct page *page = sg_page(sg);
 
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
memcpy(kaddr, buffer, bsize_elem);
-   kunmap(page);
+   kunmap_thread(page);
 
sg->length = bsize_elem;
 
@@ -3938,9 +3938,9 @@ static int ipr_copy_ucode_buffer(struct ipr_sglist 
*sglist,
if (len % bsize_elem) {
struct page *page = sg_page(sg);
 
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
memcpy(kaddr, buffer, len % bsize_elem);
-   kunmap(page);
+   kunmap_thread(page);
 
sg->length = len % bsize_elem;
}
diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c
index aa9ae2ae8579..4b05ba4b8a11 100644
--- a/drivers/scsi/pmcraid.c
+++ b/drivers/scsi/pmcraid.c
@@ -3269,13 +3269,13 @@ static int pmcraid_copy_sglist(
for (i = 0; i < (len / bsize_elem); i++, sg = sg_next(sg), buffer += 
bsize_elem) {
struct page *page = sg_page(sg);
 
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
if (direction == DMA_TO_DEVICE)
rc = copy_from_user(kaddr, buffer, bsize_elem);
else
rc = copy_to_user(buffer, kaddr, bsize_elem);
 
-   kunmap(page);
+   kunmap_thread(page);
 
if (rc) {
pmcraid_err("failed to copy user data into sg list\n");
@@ -3288,14 +3288,14 @@ static int pmcraid_copy_sglist(
if (len % bsize_elem) {
struct page *page = sg_page(sg);
 
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
 
if (direction == DMA_TO_DEVICE)
rc = copy_from_user(kaddr, buffer, len % bsize_elem);
else
rc = copy_to_user(buffer, kaddr, len % bsize_elem);
 
-   kunmap(page);
+   kunmap_thread(page);
 
sg->length = len % bsize_elem;
}
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 43/58] drivers/mmc: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Ulf Hansson 
Cc: Sascha Sommer 
Signed-off-by: Ira Weiny 
---
 drivers/mmc/host/mmc_spi.c| 4 ++--
 drivers/mmc/host/sdricoh_cs.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/host/mmc_spi.c b/drivers/mmc/host/mmc_spi.c
index 18a850f37ddc..ab28e7103b8d 100644
--- a/drivers/mmc/host/mmc_spi.c
+++ b/drivers/mmc/host/mmc_spi.c
@@ -918,7 +918,7 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct 
mmc_command *cmd,
}
 
/* allow pio too; we don't allow highmem */
-   kmap_addr = kmap(sg_page(sg));
+   kmap_addr = kmap_thread(sg_page(sg));
if (direction == DMA_TO_DEVICE)
t->tx_buf = kmap_addr + sg->offset;
else
@@ -950,7 +950,7 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct 
mmc_command *cmd,
/* discard mappings */
if (direction == DMA_FROM_DEVICE)
flush_kernel_dcache_page(sg_page(sg));
-   kunmap(sg_page(sg));
+   kunmap_thread(sg_page(sg));
if (dma_dev)
dma_unmap_page(dma_dev, dma_addr, PAGE_SIZE, dir);
 
diff --git a/drivers/mmc/host/sdricoh_cs.c b/drivers/mmc/host/sdricoh_cs.c
index 76a8cd3a186f..7806bc69c4f1 100644
--- a/drivers/mmc/host/sdricoh_cs.c
+++ b/drivers/mmc/host/sdricoh_cs.c
@@ -312,11 +312,11 @@ static void sdricoh_request(struct mmc_host *mmc, struct 
mmc_request *mrq)
int result;
page = sg_page(data->sg);
 
-   buf = kmap(page) + data->sg->offset + (len * i);
+   buf = kmap_thread(page) + data->sg->offset + (len * i);
result =
sdricoh_blockio(host,
data->flags & MMC_DATA_READ, buf, len);
-   kunmap(page);
+   kunmap_thread(page);
flush_dcache_page(page);
if (result) {
dev_err(dev, "sdricoh_request: cmd %i "
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 41/58] drivers/target: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls in this driver are localized to a single thread.  To
avoid the over head of global PKRS updates use the new kmap_thread()
call.

Signed-off-by: Ira Weiny 
---
 drivers/target/target_core_iblock.c| 4 ++--
 drivers/target/target_core_rd.c| 4 ++--
 drivers/target/target_core_transport.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index 1c181d31f4c8..df7b1568edb3 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -415,7 +415,7 @@ iblock_execute_zero_out(struct block_device *bdev, struct 
se_cmd *cmd)
unsigned char *buf, *not_zero;
int ret;
 
-   buf = kmap(sg_page(sg)) + sg->offset;
+   buf = kmap_thread(sg_page(sg)) + sg->offset;
if (!buf)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
/*
@@ -423,7 +423,7 @@ iblock_execute_zero_out(struct block_device *bdev, struct 
se_cmd *cmd)
 * incoming WRITE_SAME payload does not contain zeros.
 */
not_zero = memchr_inv(buf, 0x00, cmd->data_length);
-   kunmap(sg_page(sg));
+   kunmap_thread(sg_page(sg));
 
if (not_zero)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c
index 408bd975170b..dbbdd39c5bf9 100644
--- a/drivers/target/target_core_rd.c
+++ b/drivers/target/target_core_rd.c
@@ -159,9 +159,9 @@ static int rd_allocate_sgl_table(struct rd_dev *rd_dev, 
struct rd_dev_sg_table *
sg_assign_page(&sg[j], pg);
sg[j].length = PAGE_SIZE;
 
-   p = kmap(pg);
+   p = kmap_thread(pg);
memset(p, init_payload, PAGE_SIZE);
-   kunmap(pg);
+   kunmap_thread(pg);
}
 
page_offset += sg_per_table;
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index ff26ab0a5f60..8d0bae5a92e5 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1692,11 +1692,11 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, 
struct se_session *se_sess
unsigned char *buf = NULL;
 
if (sgl)
-   buf = kmap(sg_page(sgl)) + sgl->offset;
+   buf = kmap_thread(sg_page(sgl)) + sgl->offset;
 
if (buf) {
memset(buf, 0, sgl->length);
-   kunmap(sg_page(sgl));
+   kunmap_thread(sg_page(sgl));
}
}
 
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 39/58] fs/jffs2: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over head of
global PKRS updates use the new kmap_thread() call.

Signed-off-by: Ira Weiny 
---
 fs/jffs2/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index 3e6d54f9b011..14dd2b18cc16 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -287,13 +287,13 @@ static int jffs2_write_end(struct file *filp, struct 
address_space *mapping,
 
/* In 2.4, it was already kmapped by generic_file_write(). Doesn't
   hurt to do it again. The alternative is ifdefs, which are ugly. */
-   kmap(pg);
+   kmap_thread(pg);
 
ret = jffs2_write_inode_range(c, f, ri, page_address(pg) + 
aligned_start,
  (pg->index << PAGE_SHIFT) + aligned_start,
  end - aligned_start, &writtenlen);
 
-   kunmap(pg);
+   kunmap_thread(pg);
 
if (ret) {
/* There was an error writing. */
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 38/58] fs/isofs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over head of
global PKRS updates use the new kmap_thread() call.

Signed-off-by: Ira Weiny 
---
 fs/isofs/compress.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/isofs/compress.c b/fs/isofs/compress.c
index bc12ac7e2312..ddd3fd99d2e1 100644
--- a/fs/isofs/compress.c
+++ b/fs/isofs/compress.c
@@ -344,7 +344,7 @@ static int zisofs_readpage(struct file *file, struct page 
*page)
pages[i] = grab_cache_page_nowait(mapping, index);
if (pages[i]) {
ClearPageError(pages[i]);
-   kmap(pages[i]);
+   kmap_thread(pages[i]);
}
}
 
@@ -356,7 +356,7 @@ static int zisofs_readpage(struct file *file, struct page 
*page)
flush_dcache_page(pages[i]);
if (i == full_page && err)
SetPageError(pages[i]);
-   kunmap(pages[i]);
+   kunmap_thread(pages[i]);
unlock_page(pages[i]);
if (i != full_page)
put_page(pages[i]);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 37/58] fs/ext2: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS update use the new kmap_thread() call instead.

Cc: Jan Kara 
Signed-off-by: Ira Weiny 
---
 fs/ext2/dir.c  | 2 +-
 fs/ext2/ext2.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index f3194bf20733..abe97ba458c8 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -196,7 +196,7 @@ static struct page * ext2_get_page(struct inode *dir, 
unsigned long n,
struct address_space *mapping = dir->i_mapping;
struct page *page = read_mapping_page(mapping, n, NULL);
if (!IS_ERR(page)) {
-   kmap(page);
+   kmap_thread(page);
if (unlikely(!PageChecked(page))) {
if (PageError(page) || !ext2_check_page(page, quiet))
goto fail;
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 021ec8b42ac3..9bcb6714c255 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -749,7 +749,7 @@ extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode 
*, struct page **);
 extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct 
page *, struct inode *, int);
 static inline void ext2_put_page(struct page *page)
 {
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
 }
 
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 36/58] fs/ext2: Use ext2_put_page

2020-10-09 Thread ira . weiny
From: Ira Weiny 

There are 3 places in namei.c where the equivalent of ext2_put_page() is
open coded.  We want to use k[un]map_thread() instead of k[un]map() in
ext2_[get|put]_page().

Move ext2_put_page() to ext2.h and use it in namei.c in prep for
converting the k[un]map() code.

Cc: Jan Kara 
Signed-off-by: Ira Weiny 
---
 fs/ext2/dir.c   |  6 --
 fs/ext2/ext2.h  |  8 
 fs/ext2/namei.c | 15 +--
 3 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 70355ab6740e..f3194bf20733 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -66,12 +66,6 @@ static inline unsigned ext2_chunk_size(struct inode *inode)
return inode->i_sb->s_blocksize;
 }
 
-static inline void ext2_put_page(struct page *page)
-{
-   kunmap(page);
-   put_page(page);
-}
-
 /*
  * Return the offset into page `page_nr' of the last valid
  * byte in that page, plus one.
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 5136b7289e8d..021ec8b42ac3 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* XXX Here for now... not interested in restructing headers JUST now */
 
@@ -745,6 +747,12 @@ extern int ext2_delete_entry (struct ext2_dir_entry_2 *, 
struct page *);
 extern int ext2_empty_dir (struct inode *);
 extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **);
 extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct 
page *, struct inode *, int);
+static inline void ext2_put_page(struct page *page)
+{
+   kunmap(page);
+   put_page(page);
+}
+
 
 /* ialloc.c */
 extern struct inode * ext2_new_inode (struct inode *, umode_t, const struct 
qstr *);
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 5bf2c145643b..ea980f1e2e99 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -389,23 +389,18 @@ static int ext2_rename (struct inode * old_dir, struct 
dentry * old_dentry,
if (dir_de) {
if (old_dir != new_dir)
ext2_set_link(old_inode, dir_de, dir_page, new_dir, 0);
-   else {
-   kunmap(dir_page);
-   put_page(dir_page);
-   }
+   else
+   ext2_put_page(dir_page);
inode_dec_link_count(old_dir);
}
return 0;
 
 
 out_dir:
-   if (dir_de) {
-   kunmap(dir_page);
-   put_page(dir_page);
-   }
+   if (dir_de)
+   ext2_put_page(dir_page);
 out_old:
-   kunmap(old_page);
-   put_page(old_page);
+   ext2_put_page(old_page);
 out:
return err;
 }
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 35/58] fs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

These kmap() calls are localized to a single thread.  To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Alexander Viro 
Cc: Jens Axboe 
Signed-off-by: Ira Weiny 
---
 fs/aio.c  |  4 ++--
 fs/binfmt_elf.c   |  4 ++--
 fs/binfmt_elf_fdpic.c |  4 ++--
 fs/exec.c | 10 +-
 fs/io_uring.c |  4 ++--
 fs/splice.c   |  4 ++--
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index d5ec30385566..27f95996d25f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1223,10 +1223,10 @@ static long aio_read_events_ring(struct kioctx *ctx,
avail = min(avail, nr - ret);
avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos);
 
-   ev = kmap(page);
+   ev = kmap_thread(page);
copy_ret = copy_to_user(event + ret, ev + pos,
sizeof(*ev) * avail);
-   kunmap(page);
+   kunmap_thread(page);
 
if (unlikely(copy_ret)) {
ret = -EFAULT;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 13d053982dd7..1a332ef1ae03 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2430,9 +2430,9 @@ static int elf_core_dump(struct coredump_params *cprm)
 
page = get_dump_page(addr);
if (page) {
-   void *kaddr = kmap(page);
+   void *kaddr = kmap_thread(page);
stop = !dump_emit(cprm, kaddr, PAGE_SIZE);
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
} else
stop = !dump_skip(cprm, PAGE_SIZE);
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 50f845702b92..8fbe188e0fdd 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1542,9 +1542,9 @@ static bool elf_fdpic_dump_segments(struct 
coredump_params *cprm)
bool res;
struct page *page = get_dump_page(addr);
if (page) {
-   void *kaddr = kmap(page);
+   void *kaddr = kmap_thread(page);
res = dump_emit(cprm, kaddr, PAGE_SIZE);
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
} else {
res = dump_skip(cprm, PAGE_SIZE);
diff --git a/fs/exec.c b/fs/exec.c
index a91003e28eaa..3948b8511e3a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -575,11 +575,11 @@ static int copy_strings(int argc, struct user_arg_ptr 
argv,
 
if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
-   kunmap(kmapped_page);
+   kunmap_thread(kmapped_page);
put_arg_page(kmapped_page);
}
kmapped_page = page;
-   kaddr = kmap(kmapped_page);
+   kaddr = kmap_thread(kmapped_page);
kpos = pos & PAGE_MASK;
flush_arg_page(bprm, kpos, kmapped_page);
}
@@ -593,7 +593,7 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
 out:
if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
-   kunmap(kmapped_page);
+   kunmap_thread(kmapped_page);
put_arg_page(kmapped_page);
}
return ret;
@@ -871,11 +871,11 @@ int transfer_args_to_stack(struct linux_binprm *bprm,
 
for (index = MAX_ARG_PAGES - 1; index >= stop; index--) {
unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0;
-   char *src = kmap(bprm->page[index]) + offset;
+   char *src = kmap_thread(bprm->page[index]) + offset;
sp -= PAGE_SIZE - offset;
if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0)
ret = -EFAULT;
-   kunmap(bprm->page[index]);
+   kunmap_thread(bprm->page[index]);
if (ret)
goto out;
}
diff --git a/fs/io_uring.c b/fs/io_uring.c
index aae0ef2ec34d..f59bb079822d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2903,7 +2903,7 @@ static ssize_t loop_rw_iter(int rw, struct file *file, 
struct kiocb *kiocb,
iovec = iov_iter_iovec(iter);
} else {
/* fixed buffers import bvec */
-   iovec.

[PATCH RFC PKS/PMEM 34/58] fs/erofs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Gao Xiang 
Cc: Chao Yu 
Signed-off-by: Ira Weiny 
---
 fs/erofs/super.c | 4 ++--
 fs/erofs/xattr.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index ddaa516c008a..41696b60f1b3 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -139,7 +139,7 @@ static int erofs_read_superblock(struct super_block *sb)
 
sbi = EROFS_SB(sb);
 
-   data = kmap(page);
+   data = kmap_thread(page);
dsb = (struct erofs_super_block *)(data + EROFS_SUPER_OFFSET);
 
ret = -EINVAL;
@@ -189,7 +189,7 @@ static int erofs_read_superblock(struct super_block *sb)
}
ret = 0;
 out:
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
return ret;
 }
diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c
index c8c381eadcd6..1771baa99d77 100644
--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -20,7 +20,7 @@ static inline void xattr_iter_end(struct xattr_iter *it, bool 
atomic)
 {
/* the only user of kunmap() is 'init_inode_xattrs' */
if (!atomic)
-   kunmap(it->page);
+   kunmap_thread(it->page);
else
kunmap_atomic(it->kaddr);
 
@@ -96,7 +96,7 @@ static int init_inode_xattrs(struct inode *inode)
}
 
/* read in shared xattr array (non-atomic, see kmalloc below) */
-   it.kaddr = kmap(it.page);
+   it.kaddr = kmap_thread(it.page);
atomic_map = false;
 
ih = (struct erofs_xattr_ibody_header *)(it.kaddr + it.ofs);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Nicolas Pitre 
Signed-off-by: Ira Weiny 
---
 fs/cramfs/inode.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 912308600d39..003c014a42ed 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, 
unsigned int offset,
struct page *page = pages[i];
 
if (page) {
-   memcpy(data, kmap(page), PAGE_SIZE);
-   kunmap(page);
+   memcpy(data, kmap_thread(page), PAGE_SIZE);
+   kunmap_thread(page);
put_page(page);
} else
memset(data, 0, PAGE_SIZE);
@@ -826,7 +826,7 @@ static int cramfs_readpage(struct file *file, struct page 
*page)
 
maxblock = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
bytes_filled = 0;
-   pgdata = kmap(page);
+   pgdata = kmap_thread(page);
 
if (page->index < maxblock) {
struct super_block *sb = inode->i_sb;
@@ -914,13 +914,13 @@ static int cramfs_readpage(struct file *file, struct page 
*page)
 
memset(pgdata + bytes_filled, 0, PAGE_SIZE - bytes_filled);
flush_dcache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
SetPageUptodate(page);
unlock_page(page);
return 0;
 
 err:
-   kunmap(page);
+   kunmap_thread(page);
ClearPageUptodate(page);
SetPageError(page);
unlock_page(page);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 32/58] fs/hostfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Anton Ivanov 
Signed-off-by: Ira Weiny 
---
 fs/hostfs/hostfs_kern.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index c070c0d8e3e9..608efd0f83cb 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -409,7 +409,7 @@ static int hostfs_writepage(struct page *page, struct 
writeback_control *wbc)
if (page->index >= end_index)
count = inode->i_size & (PAGE_SIZE-1);
 
-   buffer = kmap(page);
+   buffer = kmap_thread(page);
 
err = write_file(HOSTFS_I(inode)->fd, &base, buffer, count);
if (err != count) {
@@ -425,7 +425,7 @@ static int hostfs_writepage(struct page *page, struct 
writeback_control *wbc)
err = 0;
 
  out:
-   kunmap(page);
+   kunmap_thread(page);
 
unlock_page(page);
return err;
@@ -437,7 +437,7 @@ static int hostfs_readpage(struct file *file, struct page 
*page)
loff_t start = page_offset(page);
int bytes_read, ret = 0;
 
-   buffer = kmap(page);
+   buffer = kmap_thread(page);
bytes_read = read_file(FILE_HOSTFS_I(file)->fd, &start, buffer,
PAGE_SIZE);
if (bytes_read < 0) {
@@ -454,7 +454,7 @@ static int hostfs_readpage(struct file *file, struct page 
*page)
 
  out:
flush_dcache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
unlock_page(page);
return ret;
 }
@@ -480,9 +480,9 @@ static int hostfs_write_end(struct file *file, struct 
address_space *mapping,
unsigned from = pos & (PAGE_SIZE - 1);
int err;
 
-   buffer = kmap(page);
+   buffer = kmap_thread(page);
err = write_file(FILE_HOSTFS_I(file)->fd, &pos, buffer + from, copied);
-   kunmap(page);
+   kunmap_thread(page);
 
if (!PageUptodate(page) && err == PAGE_SIZE)
SetPageUptodate(page);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 31/58] fs/vboxsf: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Hans de Goede 
Signed-off-by: Ira Weiny 
---
 fs/vboxsf/file.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
index c4ab5996d97a..d9c7e6b7b4cc 100644
--- a/fs/vboxsf/file.c
+++ b/fs/vboxsf/file.c
@@ -216,7 +216,7 @@ static int vboxsf_readpage(struct file *file, struct page 
*page)
u8 *buf;
int err;
 
-   buf = kmap(page);
+   buf = kmap_thread(page);
 
err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf);
if (err == 0) {
@@ -227,7 +227,7 @@ static int vboxsf_readpage(struct file *file, struct page 
*page)
SetPageError(page);
}
 
-   kunmap(page);
+   kunmap_thread(page);
unlock_page(page);
return err;
 }
@@ -268,10 +268,10 @@ static int vboxsf_writepage(struct page *page, struct 
writeback_control *wbc)
if (!sf_handle)
return -EBADF;
 
-   buf = kmap(page);
+   buf = kmap_thread(page);
err = vboxsf_write(sf_handle->root, sf_handle->handle,
   off, &nwrite, buf);
-   kunmap(page);
+   kunmap_thread(page);
 
kref_put(&sf_handle->refcount, vboxsf_handle_release);
 
@@ -302,10 +302,10 @@ static int vboxsf_write_end(struct file *file, struct 
address_space *mapping,
if (!PageUptodate(page) && copied < len)
zero_user(page, from + copied, len - copied);
 
-   buf = kmap(page);
+   buf = kmap_thread(page);
err = vboxsf_write(sf_handle->root, sf_handle->handle,
   pos, &nwritten, buf + from);
-   kunmap(page);
+   kunmap_thread(page);
 
if (err) {
nwritten = 0;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 30/58] fs/romfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Signed-off-by: Ira Weiny 
---
 fs/romfs/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index e582d001f792..9050074c6755 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -107,7 +107,7 @@ static int romfs_readpage(struct file *file, struct page 
*page)
void *buf;
int ret;
 
-   buf = kmap(page);
+   buf = kmap_thread(page);
if (!buf)
return -ENOMEM;
 
@@ -136,7 +136,7 @@ static int romfs_readpage(struct file *file, struct page 
*page)
SetPageUptodate(page);
 
flush_dcache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
unlock_page(page);
return ret;
 }
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 29/58] fs/ntfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Anton Altaparmakov 
Signed-off-by: Ira Weiny 
---
 fs/ntfs/aops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index bb0a43860ad2..11633d732809 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -1099,7 +1099,7 @@ static int ntfs_write_mst_block(struct page *page,
if (!nr_bhs)
goto done;
/* Map the page so we can access its contents. */
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
/* Clear the page uptodate flag whilst the mst fixups are applied. */
BUG_ON(!PageUptodate(page));
ClearPageUptodate(page);
@@ -1276,7 +1276,7 @@ static int ntfs_write_mst_block(struct page *page,
iput(VFS_I(base_tni));
}
SetPageUptodate(page);
-   kunmap(page);
+   kunmap_thread(page);
 done:
if (unlikely(err && err != -ENOMEM)) {
/*
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 28/58] fs/cachefiles: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: David Howells 
Signed-off-by: Ira Weiny 
---
 fs/cachefiles/rdwr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 3080cda9e824..2468e5c067ba 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -936,9 +936,9 @@ int cachefiles_write_page(struct fscache_storage *op, 
struct page *page)
}
}
 
-   data = kmap(page);
+   data = kmap_thread(page);
ret = kernel_write(file, data, len, &pos);
-   kunmap(page);
+   kunmap_thread(page);
fput(file);
if (ret != len)
goto error_eio;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 27/58] fs/ubifs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Richard Weinberger 
Signed-off-by: Ira Weiny 
---
 fs/ubifs/file.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index b77d1637bbbc..a3537447a885 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -111,7 +111,7 @@ static int do_readpage(struct page *page)
ubifs_assert(c, !PageChecked(page));
ubifs_assert(c, !PagePrivate(page));
 
-   addr = kmap(page);
+   addr = kmap_thread(page);
 
block = page->index << UBIFS_BLOCKS_PER_PAGE_SHIFT;
beyond = (i_size + UBIFS_BLOCK_SIZE - 1) >> UBIFS_BLOCK_SHIFT;
@@ -174,7 +174,7 @@ static int do_readpage(struct page *page)
SetPageUptodate(page);
ClearPageError(page);
flush_dcache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
return 0;
 
 error:
@@ -182,7 +182,7 @@ static int do_readpage(struct page *page)
ClearPageUptodate(page);
SetPageError(page);
flush_dcache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
return err;
 }
 
@@ -616,7 +616,7 @@ static int populate_page(struct ubifs_info *c, struct page 
*page,
dbg_gen("ino %lu, pg %lu, i_size %lld, flags %#lx",
inode->i_ino, page->index, i_size, page->flags);
 
-   addr = zaddr = kmap(page);
+   addr = zaddr = kmap_thread(page);
 
end_index = (i_size - 1) >> PAGE_SHIFT;
if (!i_size || page->index > end_index) {
@@ -692,7 +692,7 @@ static int populate_page(struct ubifs_info *c, struct page 
*page,
SetPageUptodate(page);
ClearPageError(page);
flush_dcache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
*n = nn;
return 0;
 
@@ -700,7 +700,7 @@ static int populate_page(struct ubifs_info *c, struct page 
*page,
ClearPageUptodate(page);
SetPageError(page);
flush_dcache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
ubifs_err(c, "bad data node (block %u, inode %lu)",
  page_block, inode->i_ino);
return -EINVAL;
@@ -918,7 +918,7 @@ static int do_writepage(struct page *page, int len)
/* Update radix tree tags */
set_page_writeback(page);
 
-   addr = kmap(page);
+   addr = kmap_thread(page);
block = page->index << UBIFS_BLOCKS_PER_PAGE_SHIFT;
i = 0;
while (len) {
@@ -950,7 +950,7 @@ static int do_writepage(struct page *page, int len)
ClearPagePrivate(page);
ClearPageChecked(page);
 
-   kunmap(page);
+   kunmap_thread(page);
unlock_page(page);
end_page_writeback(page);
return err;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 26/58] fs/zonefs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Damien Le Moal 
Cc: Naohiro Aota 
Signed-off-by: Ira Weiny 
---
 fs/zonefs/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8ec7c8f109d7..2fd6c86beee1 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1297,7 +1297,7 @@ static int zonefs_read_super(struct super_block *sb)
if (ret)
goto free_page;
 
-   super = kmap(page);
+   super = kmap_thread(page);
 
ret = -EINVAL;
if (le32_to_cpu(super->s_magic) != ZONEFS_MAGIC)
@@ -1349,7 +1349,7 @@ static int zonefs_read_super(struct super_block *sb)
ret = 0;
 
 unmap:
-   kunmap(page);
+   kunmap_thread(page);
 free_page:
__free_page(page);
 
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 25/58] fs/reiserfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Jan Kara 
Cc: "Theodore Ts'o" 
Cc: Randy Dunlap 
Cc: Alex Shi 
Signed-off-by: Ira Weiny 
---
 fs/reiserfs/journal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index e98f99338f8f..be8f56261e8c 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -4194,11 +4194,11 @@ static int do_journal_end(struct 
reiserfs_transaction_handle *th, int flags)
SB_ONDISK_JOURNAL_SIZE(sb)));
set_buffer_uptodate(tmp_bh);
page = cn->bh->b_page;
-   addr = kmap(page);
+   addr = kmap_thread(page);
memcpy(tmp_bh->b_data,
   addr + offset_in_page(cn->bh->b_data),
   cn->bh->b_size);
-   kunmap(page);
+   kunmap_thread(page);
mark_buffer_dirty(tmp_bh);
jindex++;
set_buffer_journal_dirty(cn->bh);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 24/58] fs/freevxfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Christoph Hellwig 
Signed-off-by: Ira Weiny 
---
 fs/freevxfs/vxfs_immed.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index bfc780c682fb..9c42fec4cd85 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -69,9 +69,9 @@ vxfs_immed_readpage(struct file *fp, struct page *pp)
u_int64_t   offset = (u_int64_t)pp->index << PAGE_SHIFT;
caddr_t kaddr;
 
-   kaddr = kmap(pp);
+   kaddr = kmap_thread(pp);
memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE);
-   kunmap(pp);
+   kunmap_thread(pp);

flush_dcache_page(pp);
SetPageUptodate(pp);
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 23/58] fs/fuse: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Miklos Szeredi 
Signed-off-by: Ira Weiny 
---
 fs/fuse/readdir.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index 90e3f01bd796..953ffe6f56e3 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -536,9 +536,9 @@ static int fuse_readdir_cached(struct file *file, struct 
dir_context *ctx)
 * Contents of the page are now protected against changing by holding
 * the page lock.
 */
-   addr = kmap(page);
+   addr = kmap_thread(page);
res = fuse_parse_cache(ff, addr, size, ctx);
-   kunmap(page);
+   kunmap_thread(page);
unlock_page(page);
put_page(page);
 
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Jaegeuk Kim 
Cc: Chao Yu 
Signed-off-by: Ira Weiny 
---
 fs/f2fs/f2fs.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d9e52a7f3702..ff72a45a577e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
 
 static inline void f2fs_copy_page(struct page *src, struct page *dst)
 {
-   char *src_kaddr = kmap(src);
-   char *dst_kaddr = kmap(dst);
+   char *src_kaddr = kmap_thread(src);
+   char *dst_kaddr = kmap_thread(dst);
 
memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
-   kunmap(dst);
-   kunmap(src);
+   kunmap_thread(dst);
+   kunmap_thread(src);
 }
 
 static inline void f2fs_put_page(struct page *page, int unlock)
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 21/58] fs/nfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Trond Myklebust 
Cc: Anna Schumaker 
Signed-off-by: Ira Weiny 
---
 fs/nfs/dir.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index cb52db9a0cfb..fee321acccb4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -213,7 +213,7 @@ int nfs_readdir_make_qstr(struct qstr *string, const char 
*name, unsigned int le
 static
 int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page)
 {
-   struct nfs_cache_array *array = kmap(page);
+   struct nfs_cache_array *array = kmap_thread(page);
struct nfs_cache_array_entry *cache_entry;
int ret;
 
@@ -235,7 +235,7 @@ int nfs_readdir_add_to_array(struct nfs_entry *entry, 
struct page *page)
if (entry->eof != 0)
array->eof_index = array->size;
 out:
-   kunmap(page);
+   kunmap_thread(page);
return ret;
 }
 
@@ -347,7 +347,7 @@ int nfs_readdir_search_array(nfs_readdir_descriptor_t *desc)
struct nfs_cache_array *array;
int status;
 
-   array = kmap(desc->page);
+   array = kmap_thread(desc->page);
 
if (*desc->dir_cookie == 0)
status = nfs_readdir_search_for_pos(array, desc);
@@ -359,7 +359,7 @@ int nfs_readdir_search_array(nfs_readdir_descriptor_t *desc)
desc->current_index += array->size;
desc->page_index++;
}
-   kunmap(desc->page);
+   kunmap_thread(desc->page);
return status;
 }
 
@@ -602,10 +602,10 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t 
*desc, struct nfs_entry *en
 
 out_nopages:
if (count == 0 || (status == -EBADCOOKIE && entry->eof != 0)) {
-   array = kmap(page);
+   array = kmap_thread(page);
array->eof_index = array->size;
status = 0;
-   kunmap(page);
+   kunmap_thread(page);
}
 
put_page(scratch);
@@ -669,7 +669,7 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t 
*desc, struct page *page,
goto out;
}
 
-   array = kmap(page);
+   array = kmap_thread(page);
 
status = nfs_readdir_alloc_pages(pages, array_size);
if (status < 0)
@@ -691,7 +691,7 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t 
*desc, struct page *page,
 
nfs_readdir_free_pages(pages, array_size);
 out_release_array:
-   kunmap(page);
+   kunmap_thread(page);
nfs4_label_free(entry.label);
 out:
nfs_free_fattr(entry.fattr);
@@ -803,7 +803,7 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc)
struct nfs_cache_array *array = NULL;
struct nfs_open_dir_context *ctx = file->private_data;
 
-   array = kmap(desc->page);
+   array = kmap_thread(desc->page);
for (i = desc->cache_entry_index; i < array->size; i++) {
struct nfs_cache_array_entry *ent;
 
@@ -827,7 +827,7 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc)
if (array->eof_index >= 0)
desc->eof = true;
 
-   kunmap(desc->page);
+   kunmap_thread(desc->page);
dfprintk(DIRCACHE, "NFS: nfs_do_filldir() filling ended @ cookie %Lu; 
returning = %d\n",
(unsigned long long)*desc->dir_cookie, res);
return res;
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 20/58] fs/jffs2: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: David Woodhouse 
Cc: Richard Weinberger 
Signed-off-by: Ira Weiny 
---
 fs/jffs2/file.c | 4 ++--
 fs/jffs2/gc.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index f8fb89b10227..3e6d54f9b011 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -88,7 +88,7 @@ static int jffs2_do_readpage_nolock (struct inode *inode, 
struct page *pg)
 
BUG_ON(!PageLocked(pg));
 
-   pg_buf = kmap(pg);
+   pg_buf = kmap_thread(pg);
/* FIXME: Can kmap fail? */
 
ret = jffs2_read_inode_range(c, f, pg_buf, pg->index << PAGE_SHIFT,
@@ -103,7 +103,7 @@ static int jffs2_do_readpage_nolock (struct inode *inode, 
struct page *pg)
}
 
flush_dcache_page(pg);
-   kunmap(pg);
+   kunmap_thread(pg);
 
jffs2_dbg(2, "readpage finished\n");
return ret;
diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index 373b3b7c9f44..a7259783ab84 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -1335,7 +1335,7 @@ static int jffs2_garbage_collect_dnode(struct 
jffs2_sb_info *c, struct jffs2_era
return PTR_ERR(page);
}
 
-   pg_ptr = kmap(page);
+   pg_ptr = kmap_thread(page);
mutex_lock(&f->sem);
 
offset = start;
@@ -1400,7 +1400,7 @@ static int jffs2_garbage_collect_dnode(struct 
jffs2_sb_info *c, struct jffs2_era
}
}
 
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
return ret;
 }
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 19/58] fs/hfsplus: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Signed-off-by: Ira Weiny 
---
 fs/hfsplus/bitmap.c |  20 -
 fs/hfsplus/bnode.c  | 102 ++--
 fs/hfsplus/btree.c  |  18 
 3 files changed, 70 insertions(+), 70 deletions(-)

diff --git a/fs/hfsplus/bitmap.c b/fs/hfsplus/bitmap.c
index cebce0cfe340..9ec7c1559a0c 100644
--- a/fs/hfsplus/bitmap.c
+++ b/fs/hfsplus/bitmap.c
@@ -39,7 +39,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
start = size;
goto out;
}
-   pptr = kmap(page);
+   pptr = kmap_thread(page);
curr = pptr + (offset & (PAGE_CACHE_BITS - 1)) / 32;
i = offset % 32;
offset &= ~(PAGE_CACHE_BITS - 1);
@@ -74,7 +74,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
}
curr++;
}
-   kunmap(page);
+   kunmap_thread(page);
offset += PAGE_CACHE_BITS;
if (offset >= size)
break;
@@ -84,7 +84,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
start = size;
goto out;
}
-   curr = pptr = kmap(page);
+   curr = pptr = kmap_thread(page);
if ((size ^ offset) / PAGE_CACHE_BITS)
end = pptr + PAGE_CACHE_BITS / 32;
else
@@ -127,7 +127,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
len -= 32;
}
set_page_dirty(page);
-   kunmap(page);
+   kunmap_thread(page);
offset += PAGE_CACHE_BITS;
page = read_mapping_page(mapping, offset / PAGE_CACHE_BITS,
 NULL);
@@ -135,7 +135,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
start = size;
goto out;
}
-   pptr = kmap(page);
+   pptr = kmap_thread(page);
curr = pptr;
end = pptr + PAGE_CACHE_BITS / 32;
}
@@ -151,7 +151,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
 done:
*curr = cpu_to_be32(n);
set_page_dirty(page);
-   kunmap(page);
+   kunmap_thread(page);
*max = offset + (curr - pptr) * 32 + i - start;
sbi->free_blocks -= *max;
hfsplus_mark_mdb_dirty(sb);
@@ -185,7 +185,7 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, 
u32 count)
page = read_mapping_page(mapping, pnr, NULL);
if (IS_ERR(page))
goto kaboom;
-   pptr = kmap(page);
+   pptr = kmap_thread(page);
curr = pptr + (offset & (PAGE_CACHE_BITS - 1)) / 32;
end = pptr + PAGE_CACHE_BITS / 32;
len = count;
@@ -215,11 +215,11 @@ int hfsplus_block_free(struct super_block *sb, u32 
offset, u32 count)
if (!count)
break;
set_page_dirty(page);
-   kunmap(page);
+   kunmap_thread(page);
page = read_mapping_page(mapping, ++pnr, NULL);
if (IS_ERR(page))
goto kaboom;
-   pptr = kmap(page);
+   pptr = kmap_thread(page);
curr = pptr;
end = pptr + PAGE_CACHE_BITS / 32;
}
@@ -231,7 +231,7 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, 
u32 count)
}
 out:
set_page_dirty(page);
-   kunmap(page);
+   kunmap_thread(page);
sbi->free_blocks += len;
hfsplus_mark_mdb_dirty(sb);
mutex_unlock(&sbi->alloc_mutex);
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 177fae4e6581..62757d92fbbd 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -29,14 +29,14 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int 
off, int len)
off &= ~PAGE_MASK;
 
l = min_t(int, len, PAGE_SIZE - off);
-   memcpy(buf, kmap(*pagep) + off, l);
-   kunmap(*pagep);
+   memcpy(buf, kmap_thread(*pagep) + off, l);
+   kunmap_thread(*pagep);
 
while ((len -= l) != 0) {
buf += l;
l = min_t(int, len, PAGE_SIZE);
-   memcpy(buf, kmap(*++pagep), l);
-   kunmap(*pagep);
+   memcpy(buf, kmap_thread(*++pagep), l);
+   kunmap_thread(*pagep);
}
 }
 
@@ -82,16 +82,16 @@ void hfs_bnode_write(struct hfs_bnode *node, void *buf, int 
off, int len)
off &= ~PAGE_MASK;
 
l = min_t(int, len, PAGE_SIZE - off);
-   memcpy(kmap(*pagep) + off, buf, l);
+   memcpy(kmap_thread(*pagep) + off, buf, l);
set_page_

[PATCH RFC PKS/PMEM 18/58] fs/hfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Signed-off-by: Ira Weiny 
---
 fs/hfs/bnode.c | 14 +++---
 fs/hfs/btree.c | 20 ++--
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index b63a4df7327b..8b4d02576405 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -23,8 +23,8 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf,
off += node->page_offset;
page = node->page[0];
 
-   memcpy(buf, kmap(page) + off, len);
-   kunmap(page);
+   memcpy(buf, kmap_thread(page) + off, len);
+   kunmap_thread(page);
 }
 
 u16 hfs_bnode_read_u16(struct hfs_bnode *node, int off)
@@ -108,9 +108,9 @@ void hfs_bnode_copy(struct hfs_bnode *dst_node, int dst,
src_page = src_node->page[0];
dst_page = dst_node->page[0];
 
-   memcpy(kmap(dst_page) + dst, kmap(src_page) + src, len);
-   kunmap(src_page);
-   kunmap(dst_page);
+   memcpy(kmap_thread(dst_page) + dst, kmap_thread(src_page) + src, len);
+   kunmap_thread(src_page);
+   kunmap_thread(dst_page);
set_page_dirty(dst_page);
 }
 
@@ -125,9 +125,9 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int 
src, int len)
src += node->page_offset;
dst += node->page_offset;
page = node->page[0];
-   ptr = kmap(page);
+   ptr = kmap_thread(page);
memmove(ptr + dst, ptr + src, len);
-   kunmap(page);
+   kunmap_thread(page);
set_page_dirty(page);
 }
 
diff --git a/fs/hfs/btree.c b/fs/hfs/btree.c
index 19017d296173..bd4a6d35e361 100644
--- a/fs/hfs/btree.c
+++ b/fs/hfs/btree.c
@@ -80,7 +80,7 @@ struct hfs_btree *hfs_btree_open(struct super_block *sb, u32 
id, btree_keycmp ke
goto free_inode;
 
/* Load the header */
-   head = (struct hfs_btree_header_rec *)(kmap(page) + sizeof(struct 
hfs_bnode_desc));
+   head = (struct hfs_btree_header_rec *)(kmap_thread(page) + 
sizeof(struct hfs_bnode_desc));
tree->root = be32_to_cpu(head->root);
tree->leaf_count = be32_to_cpu(head->leaf_count);
tree->leaf_head = be32_to_cpu(head->leaf_head);
@@ -119,7 +119,7 @@ struct hfs_btree *hfs_btree_open(struct super_block *sb, 
u32 id, btree_keycmp ke
tree->node_size_shift = ffs(size) - 1;
tree->pages_per_bnode = (tree->node_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
return tree;
 
@@ -268,7 +268,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
 
off += node->page_offset;
pagep = node->page + (off >> PAGE_SHIFT);
-   data = kmap(*pagep);
+   data = kmap_thread(*pagep);
off &= ~PAGE_MASK;
idx = 0;
 
@@ -281,7 +281,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
idx += i;
data[off] |= m;
set_page_dirty(*pagep);
-   kunmap(*pagep);
+   kunmap_thread(*pagep);
tree->free_nodes--;
mark_inode_dirty(tree->inode);
hfs_bnode_put(node);
@@ -290,14 +290,14 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
}
}
if (++off >= PAGE_SIZE) {
-   kunmap(*pagep);
-   data = kmap(*++pagep);
+   kunmap_thread(*pagep);
+   data = kmap_thread(*++pagep);
off = 0;
}
idx += 8;
len--;
}
-   kunmap(*pagep);
+   kunmap_thread(*pagep);
nidx = node->next;
if (!nidx) {
printk(KERN_DEBUG "create new bmap node...\n");
@@ -313,7 +313,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
off = off16;
off += node->page_offset;
pagep = node->page + (off >> PAGE_SHIFT);
-   data = kmap(*pagep);
+   data = kmap_thread(*pagep);
off &= ~PAGE_MASK;
}
 }
@@ -360,7 +360,7 @@ void hfs_bmap_free(struct hfs_bnode *node)
}
off += node->page_offset + nidx / 8;
page = node->page[off >> PAGE_SHIFT];
-   data = kmap(page);
+   data = kmap_thread(page);
off &= ~PAGE_MASK;
m = 1 <

[PATCH RFC PKS/PMEM 17/58] fs/nilfs2: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Ryusuke Konishi 
Signed-off-by: Ira Weiny 
---
 fs/nilfs2/alloc.c  | 34 +-
 fs/nilfs2/cpfile.c |  4 ++--
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index adf3bb0a8048..2aa4c34094ef 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -524,7 +524,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
ret = nilfs_palloc_get_desc_block(inode, group, 1, &desc_bh);
if (ret < 0)
return ret;
-   desc_kaddr = kmap(desc_bh->b_page);
+   desc_kaddr = kmap_thread(desc_bh->b_page);
desc = nilfs_palloc_block_get_group_desc(
inode, group, desc_bh, desc_kaddr);
n = nilfs_palloc_rest_groups_in_desc_block(inode, group,
@@ -536,7 +536,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
inode, group, 1, &bitmap_bh);
if (ret < 0)
goto out_desc;
-   bitmap_kaddr = kmap(bitmap_bh->b_page);
+   bitmap_kaddr = kmap_thread(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
bitmap, group_offset,
@@ -547,21 +547,21 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
desc, lock, -1);
req->pr_entry_nr =
entries_per_group * group + pos;
-   kunmap(desc_bh->b_page);
-   kunmap(bitmap_bh->b_page);
+   kunmap_thread(desc_bh->b_page);
+   kunmap_thread(bitmap_bh->b_page);
 
req->pr_desc_bh = desc_bh;
req->pr_bitmap_bh = bitmap_bh;
return 0;
}
-   kunmap(bitmap_bh->b_page);
+   kunmap_thread(bitmap_bh->b_page);
brelse(bitmap_bh);
}
 
group_offset = 0;
}
 
-   kunmap(desc_bh->b_page);
+   kunmap_thread(desc_bh->b_page);
brelse(desc_bh);
}
 
@@ -569,7 +569,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
return -ENOSPC;
 
  out_desc:
-   kunmap(desc_bh->b_page);
+   kunmap_thread(desc_bh->b_page);
brelse(desc_bh);
return ret;
 }
@@ -605,10 +605,10 @@ void nilfs_palloc_commit_free_entry(struct inode *inode,
spinlock_t *lock;
 
group = nilfs_palloc_group(inode, req->pr_entry_nr, &group_offset);
-   desc_kaddr = kmap(req->pr_desc_bh->b_page);
+   desc_kaddr = kmap_thread(req->pr_desc_bh->b_page);
desc = nilfs_palloc_block_get_group_desc(inode, group,
 req->pr_desc_bh, desc_kaddr);
-   bitmap_kaddr = kmap(req->pr_bitmap_bh->b_page);
+   bitmap_kaddr = kmap_thread(req->pr_bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(req->pr_bitmap_bh);
lock = nilfs_mdt_bgl_lock(inode, group);
 
@@ -620,8 +620,8 @@ void nilfs_palloc_commit_free_entry(struct inode *inode,
else
nilfs_palloc_group_desc_add_entries(desc, lock, 1);
 
-   kunmap(req->pr_bitmap_bh->b_page);
-   kunmap(req->pr_desc_bh->b_page);
+   kunmap_thread(req->pr_bitmap_bh->b_page);
+   kunmap_thread(req->pr_desc_bh->b_page);
 
mark_buffer_dirty(req->pr_desc_bh);
mark_buffer_dirty(req->pr_bitmap_bh);
@@ -646,10 +646,10 @@ void nilfs_palloc_abort_alloc_entry(struct inode *inode,
spinlock_t *lock;
 
group = nilfs_palloc_group(inode, req->pr_entry_nr, &group_offset);
-   desc_kaddr = kmap(req->pr_desc_bh->b_page);
+   desc_kaddr = kmap_thread(req->pr_desc_bh->b_page);
desc = nilfs_palloc_block_get_group_desc(inode, group,
 req->pr_desc_bh, desc_kaddr);
-   bitmap_kaddr = kmap(req->pr_bitmap_bh->b_page);
+   bitmap_kaddr = kmap_thread(req->pr_bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(req->pr_bitmap_bh);
lock = nilfs_mdt_bgl_lock(inode, group);
 
@@ -6

[PATCH RFC PKS/PMEM 16/58] fs/gfs2: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Bob Peterson 
Cc: Andreas Gruenbacher 
Signed-off-by: Ira Weiny 
---
 fs/gfs2/bmap.c   | 4 ++--
 fs/gfs2/ops_fstype.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 0f69fbd4af66..375af4528411 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -67,7 +67,7 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct 
buffer_head *dibh,
}
 
if (!PageUptodate(page)) {
-   void *kaddr = kmap(page);
+   void *kaddr = kmap_thread(page);
u64 dsize = i_size_read(inode);
  
if (dsize > gfs2_max_stuffed_size(ip))
@@ -75,7 +75,7 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct 
buffer_head *dibh,
 
memcpy(kaddr, dibh->b_data + sizeof(struct gfs2_dinode), dsize);
memset(kaddr + dsize, 0, PAGE_SIZE - dsize);
-   kunmap(page);
+   kunmap_thread(page);
 
SetPageUptodate(page);
}
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 6d18d2c91add..a5d20d9b504a 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -263,9 +263,9 @@ static int gfs2_read_super(struct gfs2_sbd *sdp, sector_t 
sector, int silent)
__free_page(page);
return -EIO;
}
-   p = kmap(page);
+   p = kmap_thread(page);
gfs2_sb_in(sdp, p);
-   kunmap(page);
+   kunmap_thread(page);
__free_page(page);
return gfs2_check_sb(sdp, silent);
 }
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 15/58] fs/ecryptfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Herbert Xu 
Cc: Eric Biggers 
Cc: Aditya Pakki 
Signed-off-by: Ira Weiny 
---
 fs/ecryptfs/crypto.c | 8 
 fs/ecryptfs/read_write.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 0681540c48d9..e73e00994bee 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -469,10 +469,10 @@ int ecryptfs_encrypt_page(struct page *page)
}
 
lower_offset = lower_offset_for_page(crypt_stat, page);
-   enc_extent_virt = kmap(enc_extent_page);
+   enc_extent_virt = kmap_thread(enc_extent_page);
rc = ecryptfs_write_lower(ecryptfs_inode, enc_extent_virt, lower_offset,
  PAGE_SIZE);
-   kunmap(enc_extent_page);
+   kunmap_thread(enc_extent_page);
if (rc < 0) {
ecryptfs_printk(KERN_ERR,
"Error attempting to write lower page; rc = [%d]\n",
@@ -518,10 +518,10 @@ int ecryptfs_decrypt_page(struct page *page)
BUG_ON(!(crypt_stat->flags & ECRYPTFS_ENCRYPTED));
 
lower_offset = lower_offset_for_page(crypt_stat, page);
-   page_virt = kmap(page);
+   page_virt = kmap_thread(page);
rc = ecryptfs_read_lower(page_virt, lower_offset, PAGE_SIZE,
 ecryptfs_inode);
-   kunmap(page);
+   kunmap_thread(page);
if (rc < 0) {
ecryptfs_printk(KERN_ERR,
"Error attempting to read lower page; rc = [%d]\n",
diff --git a/fs/ecryptfs/read_write.c b/fs/ecryptfs/read_write.c
index 0438997ac9d8..5eca4330c0c0 100644
--- a/fs/ecryptfs/read_write.c
+++ b/fs/ecryptfs/read_write.c
@@ -64,11 +64,11 @@ int ecryptfs_write_lower_page_segment(struct inode 
*ecryptfs_inode,
 
offset = loff_t)page_for_lower->index) << PAGE_SHIFT)
  + offset_in_page);
-   virt = kmap(page_for_lower);
+   virt = kmap_thread(page_for_lower);
rc = ecryptfs_write_lower(ecryptfs_inode, virt, offset, size);
if (rc > 0)
rc = 0;
-   kunmap(page_for_lower);
+   kunmap_thread(page_for_lower);
return rc;
 }
 
@@ -251,11 +251,11 @@ int ecryptfs_read_lower_page_segment(struct page 
*page_for_ecryptfs,
int rc;
 
offset = loff_t)page_index) << PAGE_SHIFT) + offset_in_page);
-   virt = kmap(page_for_ecryptfs);
+   virt = kmap_thread(page_for_ecryptfs);
rc = ecryptfs_read_lower(virt, offset, size, ecryptfs_inode);
if (rc > 0)
rc = 0;
-   kunmap(page_for_ecryptfs);
+   kunmap_thread(page_for_ecryptfs);
flush_dcache_page(page_for_ecryptfs);
return rc;
 }
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 14/58] fs/cifs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Steve French 
Signed-off-by: Ira Weiny 
---
 fs/cifs/cifsencrypt.c |  6 +++---
 fs/cifs/file.c| 16 
 fs/cifs/smb2ops.c |  8 
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
index 9daa256f69d4..2f8232d01a56 100644
--- a/fs/cifs/cifsencrypt.c
+++ b/fs/cifs/cifsencrypt.c
@@ -82,17 +82,17 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
 
rqst_page_get_length(rqst, i, &len, &offset);
 
-   kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;
+   kaddr = (char *) kmap_thread(rqst->rq_pages[i]) + offset;
 
rc = crypto_shash_update(shash, kaddr, len);
if (rc) {
cifs_dbg(VFS, "%s: Could not update with payload\n",
 __func__);
-   kunmap(rqst->rq_pages[i]);
+   kunmap_thread(rqst->rq_pages[i]);
return rc;
}
 
-   kunmap(rqst->rq_pages[i]);
+   kunmap_thread(rqst->rq_pages[i]);
}
 
rc = crypto_shash_final(shash, signature);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index be46fab4c96d..6db2caab8852 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2145,17 +2145,17 @@ static int cifs_partialpagewrite(struct page *page, 
unsigned from, unsigned to)
inode = page->mapping->host;
 
offset += (loff_t)from;
-   write_data = kmap(page);
+   write_data = kmap_thread(page);
write_data += from;
 
if ((to > PAGE_SIZE) || (from > to)) {
-   kunmap(page);
+   kunmap_thread(page);
return -EIO;
}
 
/* racing with truncate? */
if (offset > mapping->host->i_size) {
-   kunmap(page);
+   kunmap_thread(page);
return 0; /* don't care */
}
 
@@ -2183,7 +2183,7 @@ static int cifs_partialpagewrite(struct page *page, 
unsigned from, unsigned to)
rc = -EIO;
}
 
-   kunmap(page);
+   kunmap_thread(page);
return rc;
 }
 
@@ -2559,10 +2559,10 @@ static int cifs_write_end(struct file *file, struct 
address_space *mapping,
   known which we might as well leverage */
/* BB check if anything else missing out of ppw
   such as updating last write time */
-   page_data = kmap(page);
+   page_data = kmap_thread(page);
rc = cifs_write(cfile, pid, page_data + offset, copied, &pos);
/* if (rc < 0) should we set writebehind rc? */
-   kunmap(page);
+   kunmap_thread(page);
 
free_xid(xid);
} else {
@@ -4511,7 +4511,7 @@ static int cifs_readpage_worker(struct file *file, struct 
page *page,
if (rc == 0)
goto read_complete;
 
-   read_data = kmap(page);
+   read_data = kmap_thread(page);
/* for reads over a certain size could initiate async read ahead */
 
rc = cifs_read(file, read_data, PAGE_SIZE, poffset);
@@ -4540,7 +4540,7 @@ static int cifs_readpage_worker(struct file *file, struct 
page *page,
rc = 0;
 
 io_error:
-   kunmap(page);
+   kunmap_thread(page);
unlock_page(page);
 
 read_complete:
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 32f90dc82c84..a3e7ebab38b6 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -4068,12 +4068,12 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, 
int num_rqst,
 
rqst_page_get_length(&new_rq[i], j, &len, &offset);
 
-   dst = (char *) kmap(new_rq[i].rq_pages[j]) + offset;
-   src = (char *) kmap(old_rq[i - 1].rq_pages[j]) + offset;
+   dst = (char *) kmap_thread(new_rq[i].rq_pages[j]) + 
offset;
+   src = (char *) kmap_thread(old_rq[i - 1].rq_pages[j]) + 
offset;
 
memcpy(dst, src, len);
-   kunmap(new_rq[i].rq_pages[j]);
-   kunmap(old_rq[i - 1].rq_pages[j]);
+   kunmap_thread(new_rq[i].rq_pages[j]);
+   kunmap_thread(old_rq[i - 1].rq_pages[j]);
}
}
 
-- 
2.28.0.rc0.12.gb6a658bd00c9



[PATCH RFC PKS/PMEM 13/58] fs/btrfs: Utilize new kmap_thread()

2020-10-09 Thread ira . weiny
From: Ira Weiny 

The kmap() calls in this FS are localized to a single thread.  To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Signed-off-by: Ira Weiny 
---
 fs/btrfs/check-integrity.c |  4 ++--
 fs/btrfs/compression.c |  4 ++--
 fs/btrfs/inode.c   | 16 
 fs/btrfs/lzo.c | 24 
 fs/btrfs/raid56.c  | 34 +-
 fs/btrfs/reflink.c |  8 
 fs/btrfs/send.c|  4 ++--
 fs/btrfs/zlib.c| 32 
 fs/btrfs/zstd.c| 20 ++--
 9 files changed, 73 insertions(+), 73 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 81a8c87a5afb..9e5a02512ab5 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2706,7 +2706,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
 
bio_for_each_segment(bvec, bio, iter) {
BUG_ON(bvec.bv_len != PAGE_SIZE);
-   mapped_datav[i] = kmap(bvec.bv_page);
+   mapped_datav[i] = kmap_thread(bvec.bv_page);
i++;
 
if (dev_state->state->print_mask &
@@ -2720,7 +2720,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
  bio, &bio_is_patched,
  bio->bi_opf);
bio_for_each_segment(bvec, bio, iter)
-   kunmap(bvec.bv_page);
+   kunmap_thread(bvec.bv_page);
kfree(mapped_datav);
} else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) {
if (dev_state->state->print_mask &
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1ab56a734e70..5944fb36d68a 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1626,7 +1626,7 @@ static void heuristic_collect_sample(struct inode *inode, 
u64 start, u64 end,
curr_sample_pos = 0;
while (index < index_end) {
page = find_get_page(inode->i_mapping, index);
-   in_data = kmap(page);
+   in_data = kmap_thread(page);
/* Handle case where the start is not aligned to PAGE_SIZE */
i = start % PAGE_SIZE;
while (i < PAGE_SIZE - SAMPLING_READ_SIZE) {
@@ -1639,7 +1639,7 @@ static void heuristic_collect_sample(struct inode *inode, 
u64 start, u64 end,
start += SAMPLING_INTERVAL;
curr_sample_pos += SAMPLING_READ_SIZE;
}
-   kunmap(page);
+   kunmap_thread(page);
put_page(page);
 
index++;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9570458aa847..9710a52c6c42 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4603,7 +4603,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t 
from, loff_t len,
if (offset != blocksize) {
if (!len)
len = blocksize - offset;
-   kaddr = kmap(page);
+   kaddr = kmap_thread(page);
if (front)
memset(kaddr + (block_start - page_offset(page)),
0, offset);
@@ -4611,7 +4611,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t 
from, loff_t len,
memset(kaddr + (block_start - page_offset(page)) +  
offset,
0, len);
flush_dcache_page(page);
-   kunmap(page);
+   kunmap_thread(page);
}
ClearPageChecked(page);
set_page_dirty(page);
@@ -6509,9 +6509,9 @@ static noinline int uncompress_inline(struct btrfs_path 
*path,
 */
 
if (max_size + pg_offset < PAGE_SIZE) {
-   char *map = kmap(page);
+   char *map = kmap_thread(page);
memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - 
pg_offset);
-   kunmap(page);
+   kunmap_thread(page);
}
kfree(tmp);
return ret;
@@ -6704,7 +6704,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode 
*inode,
goto out;
}
} else {
-   map = kmap(page);
+   map = kmap_thread(page);
read_extent_buffer(leaf, map + pg_offset, ptr,
   copy_size);
if (pg_offset + copy_size < PAGE_SIZE) {
@@ -6712,7 +6712,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode 
*inode,
   PAGE_SIZE - pg_offset -

  1   2   3   >