Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-05-02 Thread Balbir Singh
On Wed, May 2, 2018 at 6:57 AM, Dan Williams  wrote:
> On Thu, Apr 5, 2018 at 8:00 AM, Dan Williams  wrote:
>> On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin  wrote:
> [,,]
>>> What's the problem with just counting bytes copied like usercopy --
>>> why is that harder than cacheline accuracy?
>>>
 I'd rather implement the existing interface and port/support the new 
 interface
 as it becomes available
>>>
>>> Fair enough.
>>
>> I have patches already in progress to change the interface. My
>> preference is to hold off on adding a new implementation that will
>> need to be immediately reworked. When I say "immediate" I mean that
>> should be able to post what I have for review within the next few
>> days.
>>
>> Whether this is all too late for 4.17 is another question...
>
> Here is the x86 version of a 'bytes remaining' memcpy_mcsafe() implemenation:
>
> https://lists.01.org/pipermail/linux-nvdimm/2018-May/015548.html

Thanks for the heads up! I'll work on the implementation for powerpc.

Balbir Singh.


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-05-01 Thread Dan Williams
On Thu, Apr 5, 2018 at 8:00 AM, Dan Williams  wrote:
> On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin  wrote:
[,,]
>> What's the problem with just counting bytes copied like usercopy --
>> why is that harder than cacheline accuracy?
>>
>>> I'd rather implement the existing interface and port/support the new 
>>> interface
>>> as it becomes available
>>
>> Fair enough.
>
> I have patches already in progress to change the interface. My
> preference is to hold off on adding a new implementation that will
> need to be immediately reworked. When I say "immediate" I mean that
> should be able to post what I have for review within the next few
> days.
>
> Whether this is all too late for 4.17 is another question...

Here is the x86 version of a 'bytes remaining' memcpy_mcsafe() implemenation:

https://lists.01.org/pipermail/linux-nvdimm/2018-May/015548.html


RE: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-06 Thread Luck, Tony
> I thought the cache-aligned might make sense, since usually we'd expect the
> failure to be at a cache-line level, but our copy_tofrom_user does accurate
> accounting

That's one of the wrinkles in the current x86 memcpy_mcsafe(). It starts by
checking alignment of the source address, and moves a byte at a time until
it is 8-byte aligned. We do this because current x86 implementations do not
gracefully handle an unaligned read that spans from a good cache-line into
a poisoned one.

This is different from copy_tofrom_user which aligns the destination for speed
reasons (unaligned reads have a lower penalty than unaligned writes).

-Tony


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-06 Thread Balbir Singh
On Fri, Apr 6, 2018 at 11:26 AM, Nicholas Piggin  wrote:
> On Thu, 05 Apr 2018 16:40:26 -0400
> Jeff Moyer  wrote:
>
>> Nicholas Piggin  writes:
>>
>> > On Thu, 5 Apr 2018 15:53:07 +1000
>> > Balbir Singh  wrote:
>> >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem 
>> >> context
>> >> in the context of a machine check exception. Also, do we want to be byte
>> >> accurate or cache-line accurate for the bytes remaining? The former is 
>> >> much
>> >> easier than the latter :)
>> >
>> > The ideal would be a linear measure of how much of your copy reached
>> > (or can reach) non-volatile storage with nothing further copied. You
>> > may have to allow for some relaxing of the semantics depending on
>> > what the architecture can support.
>>
>> I think you've got that backwards.  memcpy_mcsafe is used to copy *from*
>> persistent memory.  The idea is to catch errors when reading pmem, not
>> writing to it.
>>

I know the comment in x86 says posted writes and cares for only loads, but I
don't see why both sides should not be handled.

>> > What's the problem with just counting bytes copied like usercopy --
>> > why is that harder than cacheline accuracy?
>>
>> He said the former (i.e. bytes) is easier.  So, I think you're on the
>> same page.  :)
>
> Oh well that makes a lot more sense in my mind now, thanks :)

I thought the cache-aligned might make sense, since usually we'd expect the
failure to be at a cache-line level, but our copy_tofrom_user does accurate
accounting

Balbir Singh.


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-05 Thread Nicholas Piggin
On Thu, 05 Apr 2018 16:40:26 -0400
Jeff Moyer  wrote:

> Nicholas Piggin  writes:
> 
> > On Thu, 5 Apr 2018 15:53:07 +1000
> > Balbir Singh  wrote:  
> >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context
> >> in the context of a machine check exception. Also, do we want to be byte
> >> accurate or cache-line accurate for the bytes remaining? The former is much
> >> easier than the latter :)  
> >
> > The ideal would be a linear measure of how much of your copy reached
> > (or can reach) non-volatile storage with nothing further copied. You
> > may have to allow for some relaxing of the semantics depending on
> > what the architecture can support.  
> 
> I think you've got that backwards.  memcpy_mcsafe is used to copy *from*
> persistent memory.  The idea is to catch errors when reading pmem, not
> writing to it.
> 
> > What's the problem with just counting bytes copied like usercopy --
> > why is that harder than cacheline accuracy?  
> 
> He said the former (i.e. bytes) is easier.  So, I think you're on the
> same page.  :)

Oh well that makes a lot more sense in my mind now, thanks :)


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-05 Thread Jeff Moyer
Nicholas Piggin  writes:

> On Thu, 5 Apr 2018 15:53:07 +1000
> Balbir Singh  wrote:
>> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context
>> in the context of a machine check exception. Also, do we want to be byte
>> accurate or cache-line accurate for the bytes remaining? The former is much
>> easier than the latter :)
>
> The ideal would be a linear measure of how much of your copy reached
> (or can reach) non-volatile storage with nothing further copied. You
> may have to allow for some relaxing of the semantics depending on
> what the architecture can support.

I think you've got that backwards.  memcpy_mcsafe is used to copy *from*
persistent memory.  The idea is to catch errors when reading pmem, not
writing to it.

> What's the problem with just counting bytes copied like usercopy --
> why is that harder than cacheline accuracy?

He said the former (i.e. bytes) is easier.  So, I think you're on the
same page.  :)

Cheers,
Jeff


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-05 Thread Dan Williams
On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin  wrote:
> On Thu, 5 Apr 2018 15:53:07 +1000
> Balbir Singh  wrote:
>
>> On Thu, 5 Apr 2018 15:04:05 +1000
>> Nicholas Piggin  wrote:
>>
>> > On Wed, 4 Apr 2018 20:00:52 -0700
>> > Dan Williams  wrote:
>> >
>> > > [ adding Matthew, Christoph, and Tony  ]
>> > >
>> > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin  
>> > > wrote:
>> > > > On Thu,  5 Apr 2018 09:19:42 +1000
>> > > > Balbir Singh  wrote:
>> > > >
>> > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem
>> > > >> layer so as to convert machine check excpetions into
>> > > >> a return value on failure in case a machine check
>> > > >> exception is encoutered during the memcpy.
>> > > >>
>> > > >> This patch largely borrows from the copyuser_power7
>> > > >> logic and does not add the VMX optimizations, largely
>> > > >> to keep the patch simple. If needed those optimizations
>> > > >> can be folded in.
>> > > >
>> > > > So memcpy_mcsafe doesn't return number of bytes copied?
>> > > > Huh, well that makes it simple.
>> > >
>> > > Well, not in current kernels, but we need to add that support or
>> > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking
>> > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(),
>> > > but for copy_to_user we also need to handle bytes remaining for write
>> > > faults. That fix is hopefully something that can land in an early
>> > > 4.17-rc, but it won't be ready for -rc1.
>> >
>> > I wonder if the powerpc implementation should just go straight to
>> > counting bytes. Backporting to this interface would be trivial, but
>> > it would just mean there's only one variant of the code to support.
>> > That's up to Balbir though.
>> >
>>
>> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context
>> in the context of a machine check exception. Also, do we want to be byte
>> accurate or cache-line accurate for the bytes remaining? The former is much
>> easier than the latter :)
>
> The ideal would be a linear measure of how much of your copy reached
> (or can reach) non-volatile storage with nothing further copied. You
> may have to allow for some relaxing of the semantics depending on
> what the architecture can support.
>
> What's the problem with just counting bytes copied like usercopy --
> why is that harder than cacheline accuracy?
>
>> I'd rather implement the existing interface and port/support the new 
>> interface
>> as it becomes available
>
> Fair enough.

I have patches already in progress to change the interface. My
preference is to hold off on adding a new implementation that will
need to be immediately reworked. When I say "immediate" I mean that
should be able to post what I have for review within the next few
days.

Whether this is all too late for 4.17 is another question...


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-05 Thread Nicholas Piggin
On Thu, 5 Apr 2018 15:53:07 +1000
Balbir Singh  wrote:

> On Thu, 5 Apr 2018 15:04:05 +1000
> Nicholas Piggin  wrote:
> 
> > On Wed, 4 Apr 2018 20:00:52 -0700
> > Dan Williams  wrote:
> >   
> > > [ adding Matthew, Christoph, and Tony  ]
> > > 
> > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin  
> > > wrote:
> > > > On Thu,  5 Apr 2018 09:19:42 +1000
> > > > Balbir Singh  wrote:
> > > >  
> > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem
> > > >> layer so as to convert machine check excpetions into
> > > >> a return value on failure in case a machine check
> > > >> exception is encoutered during the memcpy.
> > > >>
> > > >> This patch largely borrows from the copyuser_power7
> > > >> logic and does not add the VMX optimizations, largely
> > > >> to keep the patch simple. If needed those optimizations
> > > >> can be folded in.  
> > > >
> > > > So memcpy_mcsafe doesn't return number of bytes copied?
> > > > Huh, well that makes it simple.  
> > > 
> > > Well, not in current kernels, but we need to add that support or
> > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking
> > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(),
> > > but for copy_to_user we also need to handle bytes remaining for write
> > > faults. That fix is hopefully something that can land in an early
> > > 4.17-rc, but it won't be ready for -rc1.
> > 
> > I wonder if the powerpc implementation should just go straight to
> > counting bytes. Backporting to this interface would be trivial, but
> > it would just mean there's only one variant of the code to support.
> > That's up to Balbir though.
> >   
> 
> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context
> in the context of a machine check exception. Also, do we want to be byte
> accurate or cache-line accurate for the bytes remaining? The former is much
> easier than the latter :)

The ideal would be a linear measure of how much of your copy reached
(or can reach) non-volatile storage with nothing further copied. You
may have to allow for some relaxing of the semantics depending on
what the architecture can support.

What's the problem with just counting bytes copied like usercopy --
why is that harder than cacheline accuracy?

> I'd rather implement the existing interface and port/support the new interface
> as it becomes available

Fair enough.

Thanks,
Nick


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-04 Thread Balbir Singh
On Thu, 5 Apr 2018 15:04:05 +1000
Nicholas Piggin  wrote:

> On Wed, 4 Apr 2018 20:00:52 -0700
> Dan Williams  wrote:
> 
> > [ adding Matthew, Christoph, and Tony  ]
> > 
> > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin  wrote:  
> > > On Thu,  5 Apr 2018 09:19:42 +1000
> > > Balbir Singh  wrote:
> > >
> > >> The pmem infrastructure uses memcpy_mcsafe in the pmem
> > >> layer so as to convert machine check excpetions into
> > >> a return value on failure in case a machine check
> > >> exception is encoutered during the memcpy.
> > >>
> > >> This patch largely borrows from the copyuser_power7
> > >> logic and does not add the VMX optimizations, largely
> > >> to keep the patch simple. If needed those optimizations
> > >> can be folded in.
> > >
> > > So memcpy_mcsafe doesn't return number of bytes copied?
> > > Huh, well that makes it simple.
> > 
> > Well, not in current kernels, but we need to add that support or
> > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking
> > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(),
> > but for copy_to_user we also need to handle bytes remaining for write
> > faults. That fix is hopefully something that can land in an early
> > 4.17-rc, but it won't be ready for -rc1.  
> 
> I wonder if the powerpc implementation should just go straight to
> counting bytes. Backporting to this interface would be trivial, but
> it would just mean there's only one variant of the code to support.
> That's up to Balbir though.
> 

I'm thinking about it, I wonder what "bytes remaining" mean in pmem context
in the context of a machine check exception. Also, do we want to be byte
accurate or cache-line accurate for the bytes remaining? The former is much
easier than the latter :)


I'd rather implement the existing interface and port/support the new interface
as it becomes available

Balbir Singh.



Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-04 Thread Nicholas Piggin
On Wed, 4 Apr 2018 20:00:52 -0700
Dan Williams  wrote:

> [ adding Matthew, Christoph, and Tony  ]
> 
> On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin  wrote:
> > On Thu,  5 Apr 2018 09:19:42 +1000
> > Balbir Singh  wrote:
> >  
> >> The pmem infrastructure uses memcpy_mcsafe in the pmem
> >> layer so as to convert machine check excpetions into
> >> a return value on failure in case a machine check
> >> exception is encoutered during the memcpy.
> >>
> >> This patch largely borrows from the copyuser_power7
> >> logic and does not add the VMX optimizations, largely
> >> to keep the patch simple. If needed those optimizations
> >> can be folded in.  
> >
> > So memcpy_mcsafe doesn't return number of bytes copied?
> > Huh, well that makes it simple.  
> 
> Well, not in current kernels, but we need to add that support or
> remove the direct call to copy_to_iter() in fs/dax.c. I'm looking
> right now to add "bytes remaining" support to the x86 memcpy_mcsafe(),
> but for copy_to_user we also need to handle bytes remaining for write
> faults. That fix is hopefully something that can land in an early
> 4.17-rc, but it won't be ready for -rc1.

I wonder if the powerpc implementation should just go straight to
counting bytes. Backporting to this interface would be trivial, but
it would just mean there's only one variant of the code to support.
That's up to Balbir though.

Thanks,
Nick


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-04 Thread Dan Williams
[ adding Matthew, Christoph, and Tony  ]

On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin  wrote:
> On Thu,  5 Apr 2018 09:19:42 +1000
> Balbir Singh  wrote:
>
>> The pmem infrastructure uses memcpy_mcsafe in the pmem
>> layer so as to convert machine check excpetions into
>> a return value on failure in case a machine check
>> exception is encoutered during the memcpy.
>>
>> This patch largely borrows from the copyuser_power7
>> logic and does not add the VMX optimizations, largely
>> to keep the patch simple. If needed those optimizations
>> can be folded in.
>
> So memcpy_mcsafe doesn't return number of bytes copied?
> Huh, well that makes it simple.

Well, not in current kernels, but we need to add that support or
remove the direct call to copy_to_iter() in fs/dax.c. I'm looking
right now to add "bytes remaining" support to the x86 memcpy_mcsafe(),
but for copy_to_user we also need to handle bytes remaining for write
faults. That fix is hopefully something that can land in an early
4.17-rc, but it won't be ready for -rc1.


Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-04 Thread Nicholas Piggin
On Thu,  5 Apr 2018 09:19:42 +1000
Balbir Singh  wrote:

> The pmem infrastructure uses memcpy_mcsafe in the pmem
> layer so as to convert machine check excpetions into
> a return value on failure in case a machine check
> exception is encoutered during the memcpy.
> 
> This patch largely borrows from the copyuser_power7
> logic and does not add the VMX optimizations, largely
> to keep the patch simple. If needed those optimizations
> can be folded in.

So memcpy_mcsafe doesn't return number of bytes copied?
Huh, well that makes it simple.

Would be nice if there was an easy way to share this with
the regular memcpy code... that's probably for another day
though, probably better to let this settle down first.

I didn't review exact instructions, but the approach looks
right to me.

Acked-by: Nicholas Piggin 

> 
> Signed-off-by: Balbir Singh 
> ---
>  arch/powerpc/include/asm/string.h   |   2 +
>  arch/powerpc/lib/Makefile   |   2 +-
>  arch/powerpc/lib/memcpy_mcsafe_64.S | 212 
> 
>  3 files changed, 215 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S
> 
> diff --git a/arch/powerpc/include/asm/string.h 
> b/arch/powerpc/include/asm/string.h
> index 9b8cedf618f4..b7e872a64726 100644
> --- a/arch/powerpc/include/asm/string.h
> +++ b/arch/powerpc/include/asm/string.h
> @@ -30,7 +30,9 @@ extern void * memcpy_flushcache(void *,const void 
> *,__kernel_size_t);
>  #ifdef CONFIG_PPC64
>  #define __HAVE_ARCH_MEMSET32
>  #define __HAVE_ARCH_MEMSET64
> +#define __HAVE_ARCH_MEMCPY_MCSAFE
>  
> +extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz);
>  extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t);
>  extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
>  extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t);
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index 3c29c9009bbf..048afee9f518 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -24,7 +24,7 @@ endif
>  
>  obj64-y  += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
>  copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \
> -memcpy_64.o memcmp_64.o pmem.o
> +memcpy_64.o memcmp_64.o pmem.o memcpy_mcsafe_64.o
>  
>  obj64-$(CONFIG_SMP)  += locks.o
>  obj64-$(CONFIG_ALTIVEC)  += vmx-helper.o
> diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S 
> b/arch/powerpc/lib/memcpy_mcsafe_64.S
> new file mode 100644
> index ..e7eaa9b6cded
> --- /dev/null
> +++ b/arch/powerpc/lib/memcpy_mcsafe_64.S
> @@ -0,0 +1,212 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) IBM Corporation, 2011
> + * Derived from copyuser_power7.s by Anton Blanchard 
> + * Author - Balbir Singh 
> + */
> +#include 
> +#include 
> +
> + .macro err1
> +100:
> + EX_TABLE(100b,.Ldo_err1)
> + .endm
> +
> + .macro err2
> +200:
> + EX_TABLE(200b,.Ldo_err2)
> + .endm
> +
> +.Ldo_err2:
> + ld  r22,STK_REG(R22)(r1)
> + ld  r21,STK_REG(R21)(r1)
> + ld  r20,STK_REG(R20)(r1)
> + ld  r19,STK_REG(R19)(r1)
> + ld  r18,STK_REG(R18)(r1)
> + ld  r17,STK_REG(R17)(r1)
> + ld  r16,STK_REG(R16)(r1)
> + ld  r15,STK_REG(R15)(r1)
> + ld  r14,STK_REG(R14)(r1)
> + addir1,r1,STACKFRAMESIZE
> +.Ldo_err1:
> + li  r3,-EFAULT
> + blr
> +
> +
> +_GLOBAL(memcpy_mcsafe)
> + cmpldi  r5,16
> + blt .Lshort_copy
> +
> +.Lcopy:
> + /* Get the source 8B aligned */
> + neg r6,r4
> + mtocrf  0x01,r6
> + clrldi  r6,r6,(64-3)
> +
> + bf  cr7*4+3,1f
> +err1;lbz r0,0(r4)
> + addir4,r4,1
> +err1;stb r0,0(r3)
> + addir3,r3,1
> +
> +1:   bf  cr7*4+2,2f
> +err1;lhz r0,0(r4)
> + addir4,r4,2
> +err1;sth r0,0(r3)
> + addir3,r3,2
> +
> +2:   bf  cr7*4+1,3f
> +err1;lwz r0,0(r4)
> + addir4,r4,4
> +err1;stw r0,0(r3)
> + addir3,r3,4
> +
> +3:   sub r5,r5,r6
> + cmpldi  r5,128
> + blt 5f
> +
> + mflrr0
> + stdur1,-STACKFRAMESIZE(r1)
> + std r14,STK_REG(R14)(r1)
> + std r15,STK_REG(R15)(r1)
> + std r16,STK_REG(R16)(r1)
> + std r17,STK_REG(R17)(r1)
> + std r18,STK_REG(R18)(r1)
> + std r19,STK_REG(R19)(r1)
> + std r20,STK_REG(R20)(r1)
> + std r21,STK_REG(R21)(r1)
> + std r22,STK_REG(R22)(r1)
> + std r0,STACKFRAMESIZE+16(r1)
> +
> + srdir6,r5,7
> + mtctr   r6
> +
> + /* Now do cacheline (128B) sized loads and stores. */
> + .align  5
> +4:
> +err2;ld  r0,0(r4)
> +err2;ld  r6,8(r4)
> +err2;ld  r7,16(r4)
> +err2;ld  r8,24(r4)
> 

[RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem

2018-04-04 Thread Balbir Singh
The pmem infrastructure uses memcpy_mcsafe in the pmem
layer so as to convert machine check excpetions into
a return value on failure in case a machine check
exception is encoutered during the memcpy.

This patch largely borrows from the copyuser_power7
logic and does not add the VMX optimizations, largely
to keep the patch simple. If needed those optimizations
can be folded in.

Signed-off-by: Balbir Singh 
---
 arch/powerpc/include/asm/string.h   |   2 +
 arch/powerpc/lib/Makefile   |   2 +-
 arch/powerpc/lib/memcpy_mcsafe_64.S | 212 
 3 files changed, 215 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S

diff --git a/arch/powerpc/include/asm/string.h 
b/arch/powerpc/include/asm/string.h
index 9b8cedf618f4..b7e872a64726 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -30,7 +30,9 @@ extern void * memcpy_flushcache(void *,const void 
*,__kernel_size_t);
 #ifdef CONFIG_PPC64
 #define __HAVE_ARCH_MEMSET32
 #define __HAVE_ARCH_MEMSET64
+#define __HAVE_ARCH_MEMCPY_MCSAFE
 
+extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz);
 extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t);
 extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
 extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t);
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 3c29c9009bbf..048afee9f518 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -24,7 +24,7 @@ endif
 
 obj64-y+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
   copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \
-  memcpy_64.o memcmp_64.o pmem.o
+  memcpy_64.o memcmp_64.o pmem.o memcpy_mcsafe_64.o
 
 obj64-$(CONFIG_SMP)+= locks.o
 obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S 
b/arch/powerpc/lib/memcpy_mcsafe_64.S
new file mode 100644
index ..e7eaa9b6cded
--- /dev/null
+++ b/arch/powerpc/lib/memcpy_mcsafe_64.S
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) IBM Corporation, 2011
+ * Derived from copyuser_power7.s by Anton Blanchard 
+ * Author - Balbir Singh 
+ */
+#include 
+#include 
+
+   .macro err1
+100:
+   EX_TABLE(100b,.Ldo_err1)
+   .endm
+
+   .macro err2
+200:
+   EX_TABLE(200b,.Ldo_err2)
+   .endm
+
+.Ldo_err2:
+   ld  r22,STK_REG(R22)(r1)
+   ld  r21,STK_REG(R21)(r1)
+   ld  r20,STK_REG(R20)(r1)
+   ld  r19,STK_REG(R19)(r1)
+   ld  r18,STK_REG(R18)(r1)
+   ld  r17,STK_REG(R17)(r1)
+   ld  r16,STK_REG(R16)(r1)
+   ld  r15,STK_REG(R15)(r1)
+   ld  r14,STK_REG(R14)(r1)
+   addir1,r1,STACKFRAMESIZE
+.Ldo_err1:
+   li  r3,-EFAULT
+   blr
+
+
+_GLOBAL(memcpy_mcsafe)
+   cmpldi  r5,16
+   blt .Lshort_copy
+
+.Lcopy:
+   /* Get the source 8B aligned */
+   neg r6,r4
+   mtocrf  0x01,r6
+   clrldi  r6,r6,(64-3)
+
+   bf  cr7*4+3,1f
+err1;  lbz r0,0(r4)
+   addir4,r4,1
+err1;  stb r0,0(r3)
+   addir3,r3,1
+
+1: bf  cr7*4+2,2f
+err1;  lhz r0,0(r4)
+   addir4,r4,2
+err1;  sth r0,0(r3)
+   addir3,r3,2
+
+2: bf  cr7*4+1,3f
+err1;  lwz r0,0(r4)
+   addir4,r4,4
+err1;  stw r0,0(r3)
+   addir3,r3,4
+
+3: sub r5,r5,r6
+   cmpldi  r5,128
+   blt 5f
+
+   mflrr0
+   stdur1,-STACKFRAMESIZE(r1)
+   std r14,STK_REG(R14)(r1)
+   std r15,STK_REG(R15)(r1)
+   std r16,STK_REG(R16)(r1)
+   std r17,STK_REG(R17)(r1)
+   std r18,STK_REG(R18)(r1)
+   std r19,STK_REG(R19)(r1)
+   std r20,STK_REG(R20)(r1)
+   std r21,STK_REG(R21)(r1)
+   std r22,STK_REG(R22)(r1)
+   std r0,STACKFRAMESIZE+16(r1)
+
+   srdir6,r5,7
+   mtctr   r6
+
+   /* Now do cacheline (128B) sized loads and stores. */
+   .align  5
+4:
+err2;  ld  r0,0(r4)
+err2;  ld  r6,8(r4)
+err2;  ld  r7,16(r4)
+err2;  ld  r8,24(r4)
+err2;  ld  r9,32(r4)
+err2;  ld  r10,40(r4)
+err2;  ld  r11,48(r4)
+err2;  ld  r12,56(r4)
+err2;  ld  r14,64(r4)
+err2;  ld  r15,72(r4)
+err2;  ld  r16,80(r4)
+err2;  ld  r17,88(r4)
+err2;  ld  r18,96(r4)
+err2;  ld  r19,104(r4)
+err2;  ld  r20,112(r4)
+err2;  ld  r21,120(r4)
+   addir4,r4,128
+err2;  std r0,0(r3)
+err2;  std r6,8(r3)
+err2;  std r7,16(r3)
+err2;  std r8,24(r3)
+err2;  std r9,32(r3)
+err2;  std r10,40(r3)
+err2;  std r11,48(r3)
+err2;  std r12,56(r3)
+err2;  std r14,64(r3)
+err2;  std r15,72(r3)
+err2;  std r16,80(r3)
+err2;  std r17,88(r3)
+err2;  std r18,96(r3)
+err2;