from:"Ram Pai"

Re: [PATCH] powerpc/pkeys: Remove unused code

2021-02-03 Thread Ram Pai

On Tue, Feb 02, 2021 at 08:30:50PM +0530, Sandipan Das wrote:
> This removes arch_supports_pkeys(), arch_usable_pkeys() and
> thread_pkey_regs_*() which are remnants from the following:
> 
> commit 06bb53b33804 ("powerpc: store and restore the pkey state across 
> context switches")
> commit 2cd4bd192ee9 ("powerpc/pkeys: Fix handling of pkey state across 
> fork()")
> commit cf43d3b26452 ("powerpc: Enable pkey subsystem")
> 
> arch_supports_pkeys() and arch_usable_pkeys() were unused
> since their introduction while thread_pkey_regs_*() became
> unused after the introduction of the following:
> 
> commit d5fa30e6993f ("powerpc/book3s64/pkeys: Reset userspace AMR correctly 
> on exec")
> commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in 
> kernel mode")
> 
> Signed-off-by: Sandipan Das 
> ---
>  arch/powerpc/include/asm/mmu_context.h | 3 ---
>  arch/powerpc/include/asm/pkeys.h   | 6 --
>  2 files changed, 9 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index d5821834dba9..652ce85f9410 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -282,9 +282,6 @@ static inline bool arch_vma_access_permitted(struct 
> vm_area_struct *vma,
>  }
> 
>  #define pkey_mm_init(mm)
> -#define thread_pkey_regs_save(thread)
> -#define thread_pkey_regs_restore(new_thread, old_thread)
> -#define thread_pkey_regs_init(thread)
>  #define arch_dup_pkeys(oldmm, mm)
> 
>  static inline u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index a7951049e129..59a2c7dbc78f 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -169,10 +169,4 @@ static inline bool arch_pkeys_enabled(void)
>  }
> 
>  extern void pkey_mm_init(struct mm_struct *mm);
> -extern bool arch_supports_pkeys(int cap);
> -extern unsigned int arch_usable_pkeys(void);
> -extern void thread_pkey_regs_save(struct thread_struct *thread);
> -extern void thread_pkey_regs_restore(struct thread_struct *new_thread,
> -  struct thread_struct *old_thread);
> -extern void thread_pkey_regs_init(struct thread_struct *thread);
>  #endif /*_ASM_POWERPC_KEYS_H */

Looking at this code after a long time. Though I do not entirely understand,
the reason to change the logic for save and restore of the
AMR/IAMR register, I do understand that with those changes these three
function-prototypes became unnecessary.

Reviewed-by: Ram Pai 


RP

Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines

2020-12-23 Thread Ram Pai

On Wed, Dec 23, 2020 at 09:06:01PM -0300, Thiago Jung Bauermann wrote:
> 
> Hi Ram,
> 
> Thanks for reviewing this patch.
> 
> Ram Pai  writes:
> 
> > On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote:
> >> On server-class POWER machines, we don't need the SWIOTLB unless we're a
> >> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we unconditionally
> >> allocate it.
> >> 
> >> In most cases this is harmless, but on a few machine configurations (e.g.,
> >> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) it can
> >> happen that memblock can't find a 64 MB chunk of memory for the SWIOTLB and
> >> fails with a scary-looking WARN_ONCE:
> >> 
> >>  [ cut here ]
> >>  memblock: bottom-up allocation failed, memory hotremove may be affected
> >>  WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 
> >> memblock_find_in_range_node+0x328/0x340
> >>  Modules linked in:
> >>  CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6
> >>  NIP:  c0442f38 LR: c0442f34 CTR: c01e0080
> >>  REGS: c1def900 TRAP: 0700   Not tainted  (5.10.0-rc2-orig+)
> >>  MSR:  92021033   CR: 2802  XER: 
> >> 2004
> >>  CFAR: c014b7b4 IRQMASK: 1
> >>  GPR00: c0442f34 c1defba0 c1deff00 0047
> >>  GPR04: 7fff c1def828 c1def820 
> >>  GPR08: 001ffc3e c1b75478 c1b75478 0001
> >>  GPR12: 2000 c203  
> >>  GPR16:    0203
> >>  GPR20:  0001 0001 c1defc10
> >>  GPR24: c1defc08 c1c91868 c1defc18 c1c91890
> >>  GPR28:   0400 
> >>  NIP [c0442f38] memblock_find_in_range_node+0x328/0x340
> >>  LR [c0442f34] memblock_find_in_range_node+0x324/0x340
> >>  Call Trace:
> >>  [c1defba0] [c0442f34] 
> >> memblock_find_in_range_node+0x324/0x340 (unreliable)
> >>  [c1defc90] [c15ac088] memblock_alloc_range_nid+0xec/0x1b0
> >>  [c1defd40] [c15ac1f8] memblock_alloc_internal+0xac/0x110
> >>  [c1defda0] [c15ac4d0] memblock_alloc_try_nid+0x94/0xcc
> >>  [c1defe30] [c159c3c8] swiotlb_init+0x78/0x104
> >>  [c1defea0] [c158378c] mem_init+0x4c/0x98
> >>  [c1defec0] [c157457c] start_kernel+0x714/0xac8
> >>  [c1deff90] [c000d244] start_here_common+0x1c/0x58
> >>  Instruction dump:
> >>  2c23 4182ffd4 ea610088 ea810090 4bfffe84 3921 3d42fff4 3c62ff60
> >>  3863c560 992a8bfc 4bd0881d 6000 <0fe0> ea610088 4bfffd94 6000
> >>  random: get_random_bytes called from __warn+0x128/0x184 with crng_init=0
> >>  ---[ end trace  ]---
> >>  software IO TLB: Cannot allocate buffer
> >> 
> >> Unless this is a secure VM the message can actually be ignored, because the
> >> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those cases.
> >
> > The above warn_on is conveying a genuine warning. Should it be silenced?
> 
> Not sure I understand your point. This patch doesn't silence the
> warning, it avoids the problem it is warning about.

Sorry, I should have explained it better. My point is...  

If CONFIG_SWIOTLB is enabled, it means that the kernel is
promising the bounce buffering capability. I know, currently we
do not have any kernel subsystems that use bounce buffers on
non-secure-pseries-kernel or powernv-kernel.  But that does not
mean, there wont be any. In case there is such a third-party
module needing bounce buffering, it wont be able to operate,
because of the proposed change in your patch.

Is that a good thing or a bad thing, I do not know. I will let
the experts opine.

RP

Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines

2020-12-23 Thread Ram Pai

On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote:
> On server-class POWER machines, we don't need the SWIOTLB unless we're a
> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we unconditionally
> allocate it.
> 
> In most cases this is harmless, but on a few machine configurations (e.g.,
> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) it can
> happen that memblock can't find a 64 MB chunk of memory for the SWIOTLB and
> fails with a scary-looking WARN_ONCE:
> 
>  [ cut here ]
>  memblock: bottom-up allocation failed, memory hotremove may be affected
>  WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 
> memblock_find_in_range_node+0x328/0x340
>  Modules linked in:
>  CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6
>  NIP:  c0442f38 LR: c0442f34 CTR: c01e0080
>  REGS: c1def900 TRAP: 0700   Not tainted  (5.10.0-rc2-orig+)
>  MSR:  92021033   CR: 2802  XER: 
> 2004
>  CFAR: c014b7b4 IRQMASK: 1
>  GPR00: c0442f34 c1defba0 c1deff00 0047
>  GPR04: 7fff c1def828 c1def820 
>  GPR08: 001ffc3e c1b75478 c1b75478 0001
>  GPR12: 2000 c203  
>  GPR16:    0203
>  GPR20:  0001 0001 c1defc10
>  GPR24: c1defc08 c1c91868 c1defc18 c1c91890
>  GPR28:   0400 
>  NIP [c0442f38] memblock_find_in_range_node+0x328/0x340
>  LR [c0442f34] memblock_find_in_range_node+0x324/0x340
>  Call Trace:
>  [c1defba0] [c0442f34] 
> memblock_find_in_range_node+0x324/0x340 (unreliable)
>  [c1defc90] [c15ac088] memblock_alloc_range_nid+0xec/0x1b0
>  [c1defd40] [c15ac1f8] memblock_alloc_internal+0xac/0x110
>  [c1defda0] [c15ac4d0] memblock_alloc_try_nid+0x94/0xcc
>  [c1defe30] [c159c3c8] swiotlb_init+0x78/0x104
>  [c1defea0] [c158378c] mem_init+0x4c/0x98
>  [c1defec0] [c157457c] start_kernel+0x714/0xac8
>  [c1deff90] [c000d244] start_here_common+0x1c/0x58
>  Instruction dump:
>  2c23 4182ffd4 ea610088 ea810090 4bfffe84 3921 3d42fff4 3c62ff60
>  3863c560 992a8bfc 4bd0881d 6000 <0fe0> ea610088 4bfffd94 6000
>  random: get_random_bytes called from __warn+0x128/0x184 with crng_init=0
>  ---[ end trace  ]---
>  software IO TLB: Cannot allocate buffer
> 
> Unless this is a secure VM the message can actually be ignored, because the
> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those cases.

The above warn_on is conveying a genuine warning. Should it be silenced?

> 
> Fixes: eae9eec476d1 ("powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere 
> in memory")
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/mm/mem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index afab328d0887..3af991844145 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -300,7 +300,8 @@ void __init mem_init(void)
>   memblock_set_bottom_up(true);
>   if (is_secure_guest())
>   svm_swiotlb_init();
> - else
> + /* Server machines don't need SWIOTLB if they're not secure guests. */
> + else if (!machine_is(pseries) && !machine_is(powernv))

I can see powernv never needing SWIOTLB. But, pseries guests, I am not
so sure.

RP

Re: [PATCH] powerpc/book3s_hv_uvmem: Check for failed page migration

2020-12-08 Thread Ram Pai

On Thu, Dec 03, 2020 at 04:08:12PM +1100, Alistair Popple wrote:
> migrate_vma_pages() may still clear MIGRATE_PFN_MIGRATE on pages which
> are not able to be migrated. Drivers may safely copy data prior to
> calling migrate_vma_pages() however a remote mapping must not be
> established until after migrate_vma_pages() has returned as the
> migration could still fail.
> 
> UV_PAGE_IN_in both copies and maps the data page, therefore it should
> only be called after checking the results of migrate_vma_pages().
> 
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 84e5a2dc8be5..08aa6a90c525 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -762,7 +762,10 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>   goto out_finalize;
>   }
> 
> - if (pagein) {
> + *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + migrate_vma_pages();
> +
> + if ((*mig.src & MIGRATE_PFN_MIGRATE) && pagein) {
>   pfn = *mig.src >> MIGRATE_PFN_SHIFT;
>   spage = migrate_pfn_to_page(*mig.src);
>   if (spage) {
> @@ -773,8 +776,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>   }
>   }
> 
> - *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> - migrate_vma_pages();
>  out_finalize:
>   migrate_vma_finalize();
>   return ret;

Though this patch did not solve the specific problem, I am running into,
my tests did not expose any regression.

Tested-by: Ram Pai 


Reviewed-by: Ram Pai

Re: [PATCH] powerpc/book3s_hv_uvmem: Check for failed page migration

2020-12-04 Thread Ram Pai

On Fri, Dec 04, 2020 at 03:48:41PM +0530, Bharata B Rao wrote:
> On Thu, Dec 03, 2020 at 04:08:12PM +1100, Alistair Popple wrote:
> > migrate_vma_pages() may still clear MIGRATE_PFN_MIGRATE on pages which
> > are not able to be migrated. Drivers may safely copy data prior to
> > calling migrate_vma_pages() however a remote mapping must not be
> > established until after migrate_vma_pages() has returned as the
> > migration could still fail.
> > 
> > UV_PAGE_IN_in both copies and maps the data page, therefore it should
> > only be called after checking the results of migrate_vma_pages().
> > 
> > Signed-off-by: Alistair Popple 
> > ---
> >  arch/powerpc/kvm/book3s_hv_uvmem.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> > b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > index 84e5a2dc8be5..08aa6a90c525 100644
> > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > @@ -762,7 +762,10 @@ static int kvmppc_svm_page_in(struct vm_area_struct 
> > *vma,
> > goto out_finalize;
> > }
> >  
> > -   if (pagein) {
> > +   *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> > +   migrate_vma_pages();
> > +
> > +   if ((*mig.src & MIGRATE_PFN_MIGRATE) && pagein) {
> > pfn = *mig.src >> MIGRATE_PFN_SHIFT;
> > spage = migrate_pfn_to_page(*mig.src);
> > if (spage) {
> > @@ -773,8 +776,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct 
> > *vma,
> > }
> > }
> >  
> > -   *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> > -   migrate_vma_pages();
> >  out_finalize:
> > migrate_vma_finalize();
> > return ret;

This patch certainly looks like the problem, that has been hurting
us for a while.  Let me run this patch through my SVM tests.  Looks very
promising.

BTW: The code does a similar thing while paging out.  It pages out from the
UV, and then does the migration. Is there a bug there aswell?

RP

RE: [RFC v1 0/2] Plumbing to support multiple secure memory backends.

2020-10-14 Thread Ram Pai

On Wed, Oct 14, 2020 at 07:31:17AM +0100, Christoph Hellwig wrote:
> Please don't add an abstraction without a second implementation.
> Once we have the implementation we can consider the tradeoffs.  E.g.
> if expensive indirect function calls are really needed vs simple
> branches.

Ok. Not planning on upstreaming these patches till we have atleast
another backend.

-- 
Ram Pai

Re: [RFC v1 2/2] KVM: PPC: Book3S HV: abstract secure VM related calls.

2020-10-12 Thread Ram Pai

On Mon, Oct 12, 2020 at 08:58:36PM +0530, Bharata B Rao wrote:
> On Mon, Oct 12, 2020 at 12:27:43AM -0700, Ram Pai wrote:
> > Abstract the secure VM related calls into generic calls.
> > 
> > These generic calls will call the corresponding method of the
> > backend that prvoides the implementation to support secure VM.
> > 
> > Currently there is only the ultravisor based implementation.
> > Modify that implementation to act as a backed to the generic calls.
> > 
> > This plumbing will provide the flexibility to add more backends
> > in the future.
> > 
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/kvm_book3s_uvmem.h   | 100 ---
> >  arch/powerpc/include/asm/kvmppc_svm_backend.h | 250 
> > ++
> >  arch/powerpc/kvm/book3s_64_mmu_radix.c|   6 +-
> >  arch/powerpc/kvm/book3s_hv.c  |  28 +--
> >  arch/powerpc/kvm/book3s_hv_uvmem.c|  78 ++--
> >  5 files changed, 327 insertions(+), 135 deletions(-)
> >  delete mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h
> >  create mode 100644 arch/powerpc/include/asm/kvmppc_svm_backend.h
> > 
> > diff --git a/arch/powerpc/include/asm/kvmppc_svm_backend.h 
> > b/arch/powerpc/include/asm/kvmppc_svm_backend.h
> > new file mode 100644
> > index 000..be60d80
> > --- /dev/null
> > +++ b/arch/powerpc/include/asm/kvmppc_svm_backend.h
> > @@ -0,0 +1,250 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + *
> > + * Copyright IBM Corp. 2020
> > + *
> > + * Authors: Ram Pai 
> > + */
> > +
> > +#ifndef __POWERPC_KVMPPC_SVM_BACKEND_H__
> > +#define __POWERPC_KVMPPC_SVM_BACKEND_H__
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#ifdef CONFIG_PPC_BOOK3S
> > +#include 
> > +#else
> > +#include 
> > +#endif
> > +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> > +#include 
> > +#include 
> > +#include 
> > +#endif
> > +
> > +struct kvmppc_hmm_backend {
> 
> Though we started with HMM initially, what we ended up with eventually
> has nothing to do with HMM. Please don't introduce hmm again :-)

Hmm.. its still a extension to HMM to help manage device memory.
right?  What am i missing?


> 
> > +   /* initialize */
> > +   int (*kvmppc_secmem_init)(void);
> > +
> > +   /* cleanup */
> > +   void (*kvmppc_secmem_free)(void);
> > +
> > +   /* is memory available */
> > +   bool (*kvmppc_secmem_available)(void);
> > +
> > +   /* allocate a protected/secure page for the secure VM */
> > +   unsigned long (*kvmppc_svm_page_in)(struct kvm *kvm,
> > +   unsigned long gra,
> > +   unsigned long flags,
> > +   unsigned long page_shift);
> > +
> > +   /* recover the protected/secure page from the secure VM */
> > +   unsigned long (*kvmppc_svm_page_out)(struct kvm *kvm,
> > +   unsigned long gra,
> > +   unsigned long flags,
> > +   unsigned long page_shift);
> > +
> > +   /* initiate the transition of a VM to secure VM */
> > +   unsigned long (*kvmppc_svm_init_start)(struct kvm *kvm);
> > +
> > +   /* finalize the transition of a secure VM */
> > +   unsigned long (*kvmppc_svm_init_done)(struct kvm *kvm);
> > +
> > +   /* share the page on page fault */
> > +   int (*kvmppc_svm_page_share)(struct kvm *kvm, unsigned long gfn);
> > +
> > +   /* abort the transition to a secure VM */
> > +   unsigned long (*kvmppc_svm_init_abort)(struct kvm *kvm);
> > +
> > +   /* add a memory slot */
> > +   int (*kvmppc_svm_memslot_create)(struct kvm *kvm,
> > +   const struct kvm_memory_slot *new);
> > +
> > +   /* free a memory slot */
> > +   void (*kvmppc_svm_memslot_delete)(struct kvm *kvm,
> > +   const struct kvm_memory_slot *old);
> > +
> > +   /* drop pages allocated to the secure VM */
> > +   void (*kvmppc_svm_drop_pages)(const struct kvm_memory_slot *free,
> > +struct kvm *kvm, bool skip_page_out);
> > +};
> 
> Since the structure has kvmppc_ prefix, may be you can drop
> the same from its members to make the fields smaller?

ok.


> 
> > +
> > +extern const struct kvmppc_hmm_backend *kvmppc_svm_backend;
> > +
> > +static inline int kvmppc_svm_page_share(struct kvm *kvm, unsigned long gfn)
> > +{
> > +   if (!kvmppc_svm_backen

[RFC v1 2/2] KVM: PPC: Book3S HV: abstract secure VM related calls.

2020-10-12 Thread Ram Pai

Abstract the secure VM related calls into generic calls.

These generic calls will call the corresponding method of the
backend that prvoides the implementation to support secure VM.

Currently there is only the ultravisor based implementation.
Modify that implementation to act as a backed to the generic calls.

This plumbing will provide the flexibility to add more backends
in the future.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h   | 100 ---
 arch/powerpc/include/asm/kvmppc_svm_backend.h | 250 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c|   6 +-
 arch/powerpc/kvm/book3s_hv.c  |  28 +--
 arch/powerpc/kvm/book3s_hv_uvmem.c|  78 ++--
 5 files changed, 327 insertions(+), 135 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h
 create mode 100644 arch/powerpc/include/asm/kvmppc_svm_backend.h

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
deleted file mode 100644
index 0a63194..000
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ /dev/null
@@ -1,100 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_KVM_BOOK3S_UVMEM_H__
-#define __ASM_KVM_BOOK3S_UVMEM_H__
-
-#ifdef CONFIG_PPC_UV
-int kvmppc_uvmem_init(void);
-void kvmppc_uvmem_free(void);
-bool kvmppc_uvmem_available(void);
-int kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot 
*slot);
-void kvmppc_uvmem_slot_free(struct kvm *kvm,
-   const struct kvm_memory_slot *slot);
-unsigned long kvmppc_h_svm_page_in(struct kvm *kvm,
-  unsigned long gra,
-  unsigned long flags,
-  unsigned long page_shift);
-unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
-   unsigned long gra,
-   unsigned long flags,
-   unsigned long page_shift);
-unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
-unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
-int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
-unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
-void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-struct kvm *kvm, bool skip_page_out);
-int kvmppc_uvmem_memslot_create(struct kvm *kvm,
-   const struct kvm_memory_slot *new);
-void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
-   const struct kvm_memory_slot *old);
-#else
-static inline int kvmppc_uvmem_init(void)
-{
-   return 0;
-}
-
-static inline void kvmppc_uvmem_free(void) { }
-
-static inline bool kvmppc_uvmem_available(void)
-{
-   return false;
-}
-
-static inline int
-kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot)
-{
-   return 0;
-}
-
-static inline void
-kvmppc_uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot *slot) { }
-
-static inline unsigned long
-kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra,
-unsigned long flags, unsigned long page_shift)
-{
-   return H_UNSUPPORTED;
-}
-
-static inline unsigned long
-kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra,
- unsigned long flags, unsigned long page_shift)
-{
-   return H_UNSUPPORTED;
-}
-
-static inline unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
-{
-   return H_UNSUPPORTED;
-}
-
-static inline unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
-{
-   return H_UNSUPPORTED;
-}
-
-static inline unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
-{
-   return H_UNSUPPORTED;
-}
-
-static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
-{
-   return -EFAULT;
-}
-
-static inline void
-kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-   struct kvm *kvm, bool skip_page_out) { }
-
-static inline int  kvmppc_uvmem_memslot_create(struct kvm *kvm,
-   const struct kvm_memory_slot *new)
-{
-   return H_UNSUPPORTED;
-}
-
-static inline void  kvmppc_uvmem_memslot_delete(struct kvm *kvm,
-   const struct kvm_memory_slot *old) { }
-
-#endif /* CONFIG_PPC_UV */
-#endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/include/asm/kvmppc_svm_backend.h 
b/arch/powerpc/include/asm/kvmppc_svm_backend.h
new file mode 100644
index 000..be60d80
--- /dev/null
+++ b/arch/powerpc/include/asm/kvmppc_svm_backend.h
@@ -0,0 +1,250 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ *
+ * Copyright IBM Corp. 2020
+ *
+ * Authors: Ram Pai 
+ */
+
+#ifndef __POWERPC_KVMPPC_SVM_BACKEND_H__
+#define __POWERPC_KVMPPC_SVM_BACKEND_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#ifdef CONFIG_PPC_BOOK3S
+#include 
+#else
+#include 
+#endif
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#include 
+#include 
+#include

[RFC v1 1/2] KVM: PPC: Book3S HV: rename all variables in book3s_hv_uvmem.c

2020-10-12 Thread Ram Pai

Preparing this file to be one of the many backends. Since this file
supports Ultravisor backend, renaming all variable from kvmppc_*
to uvmem_*.  This is to avoid clash with some generic top level
functions to be defined in the next patch.

Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 210 ++---
 1 file changed, 105 insertions(+), 105 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 7705d55..b79affc 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -95,9 +95,9 @@
 #include 
 #include 
 
-static struct dev_pagemap kvmppc_uvmem_pgmap;
-static unsigned long *kvmppc_uvmem_bitmap;
-static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
+static struct dev_pagemap uvmem_pgmap;
+static unsigned long *uvmem_bitmap;
+static DEFINE_SPINLOCK(uvmem_bitmap_lock);
 
 /*
  * States of a GFN
@@ -221,13 +221,13 @@
 #define KVMPPC_GFN_FLAG_MASK   (KVMPPC_GFN_SECURE | KVMPPC_GFN_SHARED)
 #define KVMPPC_GFN_PFN_MASK(~KVMPPC_GFN_FLAG_MASK)
 
-struct kvmppc_uvmem_slot {
+struct uvmem_slot {
struct list_head list;
unsigned long nr_pfns;
unsigned long base_pfn;
unsigned long *pfns;
 };
-struct kvmppc_uvmem_page_pvt {
+struct uvmem_page_pvt {
struct kvm *kvm;
unsigned long gpa;
bool skip_page_out;
@@ -237,15 +237,15 @@ struct kvmppc_uvmem_page_pvt {
 bool kvmppc_uvmem_available(void)
 {
/*
-* If kvmppc_uvmem_bitmap != NULL, then there is an ultravisor
+* If uvmem_bitmap != NULL, then there is an ultravisor
 * and our data structures have been initialized successfully.
 */
-   return !!kvmppc_uvmem_bitmap;
+   return !!uvmem_bitmap;
 }
 
-int kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot)
+static int uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot)
 {
-   struct kvmppc_uvmem_slot *p;
+   struct uvmem_slot *p;
 
p = kzalloc(sizeof(*p), GFP_KERNEL);
if (!p)
@@ -268,9 +268,9 @@ int kvmppc_uvmem_slot_init(struct kvm *kvm, const struct 
kvm_memory_slot *slot)
 /*
  * All device PFNs are already released by the time we come here.
  */
-void kvmppc_uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot 
*slot)
+static void uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot 
*slot)
 {
-   struct kvmppc_uvmem_slot *p, *next;
+   struct uvmem_slot *p, *next;
 
mutex_lock(>arch.uvmem_lock);
list_for_each_entry_safe(p, next, >arch.uvmem_pfns, list) {
@@ -284,10 +284,10 @@ void kvmppc_uvmem_slot_free(struct kvm *kvm, const struct 
kvm_memory_slot *slot)
mutex_unlock(>arch.uvmem_lock);
 }
 
-static void kvmppc_mark_gfn(unsigned long gfn, struct kvm *kvm,
+static void uvmem_mark_gfn(unsigned long gfn, struct kvm *kvm,
unsigned long flag, unsigned long uvmem_pfn)
 {
-   struct kvmppc_uvmem_slot *p;
+   struct uvmem_slot *p;
 
list_for_each_entry(p, >arch.uvmem_pfns, list) {
if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
@@ -303,35 +303,35 @@ static void kvmppc_mark_gfn(unsigned long gfn, struct kvm 
*kvm,
 }
 
 /* mark the GFN as secure-GFN associated with @uvmem pfn device-PFN. */
-static void kvmppc_gfn_secure_uvmem_pfn(unsigned long gfn,
+static void uvmem_gfn_secure_uvmem_pfn(unsigned long gfn,
unsigned long uvmem_pfn, struct kvm *kvm)
 {
-   kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_UVMEM_PFN, uvmem_pfn);
+   uvmem_mark_gfn(gfn, kvm, KVMPPC_GFN_UVMEM_PFN, uvmem_pfn);
 }
 
 /* mark the GFN as secure-GFN associated with a memory-PFN. */
-static void kvmppc_gfn_secure_mem_pfn(unsigned long gfn, struct kvm *kvm)
+static void uvmem_gfn_secure_mem_pfn(unsigned long gfn, struct kvm *kvm)
 {
-   kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_MEM_PFN, 0);
+   uvmem_mark_gfn(gfn, kvm, KVMPPC_GFN_MEM_PFN, 0);
 }
 
 /* mark the GFN as a shared GFN. */
-static void kvmppc_gfn_shared(unsigned long gfn, struct kvm *kvm)
+static void uvmem_gfn_shared(unsigned long gfn, struct kvm *kvm)
 {
-   kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_SHARED, 0);
+   uvmem_mark_gfn(gfn, kvm, KVMPPC_GFN_SHARED, 0);
 }
 
 /* mark the GFN as a non-existent GFN. */
-static void kvmppc_gfn_remove(unsigned long gfn, struct kvm *kvm)
+static void uvmem_gfn_remove(unsigned long gfn, struct kvm *kvm)
 {
-   kvmppc_mark_gfn(gfn, kvm, 0, 0);
+   uvmem_mark_gfn(gfn, kvm, 0, 0);
 }
 
 /* return true, if the GFN is a secure-GFN backed by a secure-PFN */
-static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
+static bool uvmem_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
unsigned long *uvmem_pfn)
 {
-   struct kvmppc_uvmem_slot *p;
+   struct uvmem_slot *p;
 
list_for_each_entry(p, >arch.uvmem_pfns

[RFC v1 0/2] Plumbing to support multiple secure memory backends.

2020-10-12 Thread Ram Pai

Secure VMs are currently supported by the Ultravisor backend.

Enhance the plumbing to support multiple backends. Each backend shall
provide the implementation for the necessary and sufficient calls
in order to support secure VM.

Also as part of this change, modify the ultravisor implementation to
be a plugin that provides an implementation of the backend.

Ram Pai (2):
  KVM: PPC: Book3S HV: rename all variables in book3s_hv_uvmem.c
  KVM: PPC: Book3S HV: abstract secure VM related calls.

 arch/powerpc/include/asm/kvm_book3s_uvmem.h   | 100 -
 arch/powerpc/include/asm/kvmppc_svm_backend.h | 250 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c|   6 +-
 arch/powerpc/kvm/book3s_hv.c  |  28 +--
 arch/powerpc/kvm/book3s_hv_uvmem.c| 288 +++---
 5 files changed, 432 insertions(+), 240 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h
 create mode 100644 arch/powerpc/include/asm/kvmppc_svm_backend.h

-- 
1.8.3.1

Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()

2020-07-31 Thread Ram Pai

On Fri, Jul 31, 2020 at 09:59:40AM +0530, Bharata B Rao wrote:
> On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote:
> > Observed the following oops while stress-testing, using multiple
> > secureVM on a distro kernel. However this issue theoritically exists in
> > 5.5 kernel and later.
> > 
> > This issue occurs when the total number of requested device-PFNs exceed
> > the total-number of available device-PFNs.  PFN migration fails to
> > allocate a device-pfn, which causes migrate_vma_finalize() to trigger
> > kvmppc_uvmem_page_free() on a page, that is not associated with any
> > device-pfn.  kvmppc_uvmem_page_free() blindly tries to access the
> > contents of the private data which can be null, leading to the following
> > kernel fault.
> > 
> >  --
> >  Unable to handle kernel paging request for data at address 0x0011
> >  Faulting instruction address: 0xc0080e36e110
> >  Oops: Kernel access of bad area, sig: 11 [#1]
> >  LE SMP NR_CPUS=2048 NUMA PowerNV
> > 
> >  MSR:  9280b033 
> >  CR: 24424822  XER: 
> >  CFAR: c0e3d764 DAR: 0011 DSISR: 4000 IRQMASK: 0
> >  GPR00: c0080e36e0a4 c01f1d59f610 c0080e38a400 
> >  GPR04: c01fa500 fffe  c000201fffeaf300
> >  GPR08: 01f0  0f80 c0080e373608
> >  GPR12: c0e3d710 c000201fffeaf300 0001 7fef8736
> >  GPR16: 7fff97db4410 c000201c3b66a578  
> >  GPR20: 000119db9ad0 000a fffc 0001
> >  GPR24: c000201c3b66 c01f1d59f7a0 c04cffb0 0001
> >  GPR28:  c00a001ff003e000 c0080e386150 0f80
> >  NIP [c0080e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv]
> >  LR [c0080e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv]
> >  Call Trace:
> >  [c0512010] free_devmap_managed_page+0xd0/0x100
> >  [c03f71d0] put_devmap_managed_page+0xa0/0xc0
> >  [c04d24bc] migrate_vma_finalize+0x32c/0x410
> >  [c0080e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv]
> >  [c0080e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv]
> >  [c0080e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv]
> >  [c0080e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv]
> >  [c0080e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv]
> >  [c0080e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm]
> >  [c0080e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
> >  [c0080e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
> >  [c05451d0] do_vfs_ioctl+0xe0/0xaa0
> >  [c0545d64] sys_ioctl+0xc4/0x160
> >  [c000b408] system_call+0x5c/0x70
> >  Instruction dump:
> >  a12d1174 2f89 409e0158 a1271172 3929 b1271172 7c2004ac 39200000
> >  913e0140 3920 e87d0010 f93d0010 <89230011> e8c3 e9030008 2f89
> >  --
> > 
> >  Fix the oops..
> > 
> > fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests")
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/kvm/book3s_hv_uvmem.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> > b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > index 2806983..f4002bf 100644
> > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > @@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page 
> > *page)
> >  {
> > unsigned long pfn = page_to_pfn(page) -
> > (kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
> > -   struct kvmppc_uvmem_page_pvt *pvt;
> > +   struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data;
> > +
> > +   if (!pvt)
> > +   return;
> >  
> > spin_lock(_uvmem_bitmap_lock);
> > bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
> > spin_unlock(_uvmem_bitmap_lock);
> >  
> > -   pvt = page->zone_device_data;
> > page->zone_device_data = NULL;
> > if (pvt->remove_gfn)
> > kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
> 
> In our case, device pages that are in use are always associated with a valid
> pvt member. See kvmppc_uvmem_get_page() which returns failure if it

Re: [PATCH] KVM: PPC: Book3S HV: Define H_PAGE_IN_NONSHARED for H_SVM_PAGE_IN hcall

2020-07-31 Thread Ram Pai

On Fri, Jul 31, 2020 at 10:03:34AM +0530, Bharata B Rao wrote:
> On Thu, Jul 30, 2020 at 04:21:01PM -0700, Ram Pai wrote:
> > H_SVM_PAGE_IN hcall takes a flag parameter. This parameter specifies the
> > way in which a page will be treated.  H_PAGE_IN_NONSHARED indicates
> > that the page will be shared with the Secure VM, and H_PAGE_IN_SHARED
> > indicates that the page will not be shared but its contents will
> > be copied.
> 
> Looks like you got the definitions of shared and non-shared interchanged.

Yes. Realized it after sending the patch. Will fix it.

> 
> > 
> > However H_PAGE_IN_NONSHARED is not defined in the header file, though
> > it is defined and documented in the API captured in
> > Documentation/powerpc/ultravisor.rst
> > 
> > Define H_PAGE_IN_NONSHARED in the header file.
> 
> What is the use of defining this? Is this used directly in any place?
> Or, are youp planning to introduce such a usage?

I know the Hypervisor code currently has no use for that define.
It assumes  H_PAGE_IN_NONSHARED to be !H_PAGE_IN_SHARED.

But H_PAGE_IN_NONSHARED is defined in the Ultravisor API, and hence has
to be captured in the header, to stay consistent.

RP

[PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()

2020-07-30 Thread Ram Pai

Observed the following oops while stress-testing, using multiple
secureVM on a distro kernel. However this issue theoritically exists in
5.5 kernel and later.

This issue occurs when the total number of requested device-PFNs exceed
the total-number of available device-PFNs.  PFN migration fails to
allocate a device-pfn, which causes migrate_vma_finalize() to trigger
kvmppc_uvmem_page_free() on a page, that is not associated with any
device-pfn.  kvmppc_uvmem_page_free() blindly tries to access the
contents of the private data which can be null, leading to the following
kernel fault.

 --
 Unable to handle kernel paging request for data at address 0x0011
 Faulting instruction address: 0xc0080e36e110
 Oops: Kernel access of bad area, sig: 11 [#1]
 LE SMP NR_CPUS=2048 NUMA PowerNV

 MSR:  9280b033 
 CR: 24424822  XER: 
 CFAR: c0e3d764 DAR: 0011 DSISR: 4000 IRQMASK: 0
 GPR00: c0080e36e0a4 c01f1d59f610 c0080e38a400 
 GPR04: c01fa500 fffe  c000201fffeaf300
 GPR08: 01f0  0f80 c0080e373608
 GPR12: c0e3d710 c000201fffeaf300 0001 7fef8736
 GPR16: 7fff97db4410 c000201c3b66a578  
 GPR20: 000119db9ad0 000a fffc 0001
 GPR24: c000201c3b66 c01f1d59f7a0 c04cffb0 0001
 GPR28:  c00a001ff003e000 c0080e386150 0f80
 NIP [c0080e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv]
 LR [c0080e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv]
 Call Trace:
 [c0512010] free_devmap_managed_page+0xd0/0x100
 [c03f71d0] put_devmap_managed_page+0xa0/0xc0
 [c04d24bc] migrate_vma_finalize+0x32c/0x410
 [c0080e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv]
 [c0080e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv]
 [c0080e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv]
 [c0080e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv]
 [c0080e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv]
 [c0080e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm]
 [c0080e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
 [c0080e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
 [c05451d0] do_vfs_ioctl+0xe0/0xaa0
 [c0545d64] sys_ioctl+0xc4/0x160
 [c000b408] system_call+0x5c/0x70
 Instruction dump:
 a12d1174 2f89 409e0158 a1271172 3929 b1271172 7c2004ac 3920
 913e0140 3920 e87d0010 f93d0010 <89230011> e8c3 e9030008 2f89
 --

 Fix the oops..

fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 2806983..f4002bf 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *page)
 {
unsigned long pfn = page_to_pfn(page) -
(kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
-   struct kvmppc_uvmem_page_pvt *pvt;
+   struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data;
+
+   if (!pvt)
+   return;
 
spin_lock(_uvmem_bitmap_lock);
bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
spin_unlock(_uvmem_bitmap_lock);
 
-   pvt = page->zone_device_data;
page->zone_device_data = NULL;
if (pvt->remove_gfn)
kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
-- 
1.8.3.1

[PATCH] KVM: PPC: Book3S HV: Define H_PAGE_IN_NONSHARED for H_SVM_PAGE_IN hcall

2020-07-30 Thread Ram Pai

H_SVM_PAGE_IN hcall takes a flag parameter. This parameter specifies the
way in which a page will be treated.  H_PAGE_IN_NONSHARED indicates
that the page will be shared with the Secure VM, and H_PAGE_IN_SHARED
indicates that the page will not be shared but its contents will
be copied.

However H_PAGE_IN_NONSHARED is not defined in the header file, though
it is defined and documented in the API captured in
Documentation/powerpc/ultravisor.rst

Define H_PAGE_IN_NONSHARED in the header file.

Reported-by: Julia Lawall 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/hvcall.h  | 4 +++-
 arch/powerpc/kvm/book3s_hv_uvmem.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e90c073..43e3f8d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -343,7 +343,9 @@
 #define H_COPY_TOFROM_GUEST0xF80C
 
 /* Flags for H_SVM_PAGE_IN */
-#define H_PAGE_IN_SHARED0x1
+#define H_PAGE_IN_NONSHARED0x0  /* Page is not shared with the UV */
+#define H_PAGE_IN_SHARED   0x1  /* Page is shared with UV */
+#define H_PAGE_IN_MASK 0x1
 
 /* Platform-specific hcalls used by the Ultravisor */
 #define H_SVM_PAGE_IN  0xEF00
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 2dde0fb..2806983 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -947,12 +947,13 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
unsigned long gpa,
if (page_shift != PAGE_SHIFT)
return H_P3;
 
-   if (flags & ~H_PAGE_IN_SHARED)
+   if (flags & ~H_PAGE_IN_MASK)
return H_P2;
 
if (flags & H_PAGE_IN_SHARED)
return kvmppc_share_page(kvm, gpa, page_shift);
 
+   /* handle H_PAGE_IN_NONSHARED */
ret = H_PARAMETER;
srcu_idx = srcu_read_lock(>srcu);
mmap_read_lock(kvm->mm);
-- 
1.8.3.1

-- 
Ram Pai

Re: Documentation/powerpc: Ultravisor API

2020-07-30 Thread Ram Pai

On Thu, Jul 30, 2020 at 12:35:38PM +0200, Julia Lawall wrote:
> The file Documentation/powerpc/ultravisor.rst contains:
> 
> Only valid value(s) in ``flags`` are:
> 
> * H_PAGE_IN_SHARED which indicates that the page is to be shared
> with the Ultravisor.
> 
> * H_PAGE_IN_NONSHARED indicates that the UV is not anymore
>   interested in the page. Applicable if the page is a shared page.
> 
> The flag H_PAGE_IN_SHARED exists in the Linux kernel
> (arch/powerpc/include/asm/hvcall.h), but the flag H_PAGE_IN_NONSHARED does
> not.  Should the documentation be changed in some way?

Currently the code assumes H_PAGE_IN_NONSHARED as !H_PAGE_IN_SHARED.

We need to patch the kernel to explicitly define the flag.
I will submit a patch towards this.

Thanks,
RP

[PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-27 Thread Ram Pai

From: Laurent Dufour 

When a secure memslot is dropped, all the pages backed in the secure
device (aka really backed by secure memory by the Ultravisor)
should be paged out to a normal page. Previously, this was
achieved by triggering the page fault mechanism which is calling
kvmppc_svm_page_out() on each pages.

This can't work when hot unplugging a memory slot because the memory
slot is flagged as invalid and gfn_to_pfn() is then not trying to access
the page, so the page fault mechanism is not triggered.

Since the final goal is to make a call to kvmppc_svm_page_out() it seems
simpler to call directly instead of triggering such a mechanism. This
way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
memslot.

Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
the call to __kvmppc_svm_page_out() is made.  As
__kvmppc_svm_page_out needs the vma pointer to migrate the pages,
the VMA is fetched in a lazy way, to not trigger find_vma() all
the time. In addition, the mmap_sem is held in read mode during
that time, not in write mode since the virual memory layout is not
impacted, and kvm->arch.uvmem_lock prevents concurrent operation
on the secure device.

Cc: Ram Pai 
Cc: Bharata B Rao 
Cc: Paul Mackerras 
Reviewed-by: Bharata B Rao 
Signed-off-by: Ram Pai 
[modified the changelog description]
Signed-off-by: Laurent Dufour 
[modified check on the VMA in kvmppc_uvmem_drop_pages]
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 52 +-
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 565f24b..0d49e34 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -594,35 +594,53 @@ static inline int kvmppc_svm_page_out(struct 
vm_area_struct *vma,
  * fault on them, do fault time migration to replace the device PTEs in
  * QEMU page table with normal PTEs from newly allocated pages.
  */
-void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
 struct kvm *kvm, bool skip_page_out)
 {
int i;
struct kvmppc_uvmem_page_pvt *pvt;
-   unsigned long pfn, uvmem_pfn;
-   unsigned long gfn = free->base_gfn;
+   struct page *uvmem_page;
+   struct vm_area_struct *vma = NULL;
+   unsigned long uvmem_pfn, gfn;
+   unsigned long addr;
+
+   mmap_read_lock(kvm->mm);
+
+   addr = slot->userspace_addr;
 
-   for (i = free->npages; i; --i, ++gfn) {
-   struct page *uvmem_page;
+   gfn = slot->base_gfn;
+   for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
+
+   /* Fetch the VMA if addr is not in the latest fetched one */
+   if (!vma || addr >= vma->vm_end) {
+   vma = find_vma_intersection(kvm->mm, addr, addr+1);
+   if (!vma) {
+   pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
+   break;
+   }
+   }
 
mutex_lock(>arch.uvmem_lock);
-   if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, _pfn)) {
+
+   if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, _pfn)) {
+   uvmem_page = pfn_to_page(uvmem_pfn);
+   pvt = uvmem_page->zone_device_data;
+   pvt->skip_page_out = skip_page_out;
+   pvt->remove_gfn = true;
+
+   if (__kvmppc_svm_page_out(vma, addr, addr + PAGE_SIZE,
+ PAGE_SHIFT, kvm, pvt->gpa))
+   pr_err("Can't page out gpa:0x%lx addr:0x%lx\n",
+  pvt->gpa, addr);
+   } else {
+   /* Remove the shared flag if any */
kvmppc_gfn_remove(gfn, kvm);
-   mutex_unlock(>arch.uvmem_lock);
-   continue;
}
 
-   uvmem_page = pfn_to_page(uvmem_pfn);
-   pvt = uvmem_page->zone_device_data;
-   pvt->skip_page_out = skip_page_out;
-   pvt->remove_gfn = true;
mutex_unlock(>arch.uvmem_lock);
-
-   pfn = gfn_to_pfn(kvm, gfn);
-   if (is_error_noslot_pfn(pfn))
-   continue;
-   kvm_release_pfn_clean(pfn);
}
+
+   mmap_read_unlock(kvm->mm);
 }
 
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
-- 
1.8.3.1

[PATCH v2 1/2] KVM: PPC: Book3S HV: move kvmppc_svm_page_out up

2020-07-27 Thread Ram Pai

From: Laurent Dufour 

kvmppc_svm_page_out() will need to be called by kvmppc_uvmem_drop_pages()
so move it upper in this file.

Furthermore it will be interesting to call this function when already
holding the kvm->arch.uvmem_lock, so prefix the original function with __
and remove the locking in it, and introduce a wrapper which call that
function with the lock held.

There is no functional change.

Cc: Ram Pai 
Cc: Bharata B Rao 
Cc: Paul Mackerras 
Reviewed-by: Bharata B Rao 
Signed-off-by: Ram Pai 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 166 -
 1 file changed, 90 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 5b917ea..565f24b 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -497,6 +497,96 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 }
 
 /*
+ * Provision a new page on HV side and copy over the contents
+ * from secure memory using UV_PAGE_OUT uvcall.
+ * Caller must held kvm->arch.uvmem_lock.
+ */
+static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long page_shift,
+   struct kvm *kvm, unsigned long gpa)
+{
+   unsigned long src_pfn, dst_pfn = 0;
+   struct migrate_vma mig;
+   struct page *dpage, *spage;
+   struct kvmppc_uvmem_page_pvt *pvt;
+   unsigned long pfn;
+   int ret = U_SUCCESS;
+
+   memset(, 0, sizeof(mig));
+   mig.vma = vma;
+   mig.start = start;
+   mig.end = end;
+   mig.src = _pfn;
+   mig.dst = _pfn;
+   mig.src_owner = _uvmem_pgmap;
+
+   /* The requested page is already paged-out, nothing to do */
+   if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
+   return ret;
+
+   ret = migrate_vma_setup();
+   if (ret)
+   return -1;
+
+   spage = migrate_pfn_to_page(*mig.src);
+   if (!spage || !(*mig.src & MIGRATE_PFN_MIGRATE))
+   goto out_finalize;
+
+   if (!is_zone_device_page(spage))
+   goto out_finalize;
+
+   dpage = alloc_page_vma(GFP_HIGHUSER, vma, start);
+   if (!dpage) {
+   ret = -1;
+   goto out_finalize;
+   }
+
+   lock_page(dpage);
+   pvt = spage->zone_device_data;
+   pfn = page_to_pfn(dpage);
+
+   /*
+* This function is used in two cases:
+* - When HV touches a secure page, for which we do UV_PAGE_OUT
+* - When a secure page is converted to shared page, we *get*
+*   the page to essentially unmap the device page. In this
+*   case we skip page-out.
+*/
+   if (!pvt->skip_page_out)
+   ret = uv_page_out(kvm->arch.lpid, pfn << page_shift,
+ gpa, 0, page_shift);
+
+   if (ret == U_SUCCESS)
+   *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
+   else {
+   unlock_page(dpage);
+   __free_page(dpage);
+   goto out_finalize;
+   }
+
+   migrate_vma_pages();
+
+out_finalize:
+   migrate_vma_finalize();
+   return ret;
+}
+
+static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long page_shift,
+ struct kvm *kvm, unsigned long gpa)
+{
+   int ret;
+
+   mutex_lock(>arch.uvmem_lock);
+   ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
+   mutex_unlock(>arch.uvmem_lock);
+
+   return ret;
+}
+
+/*
  * Drop device pages that we maintain for the secure guest
  *
  * We first mark the pages to be skipped from UV_PAGE_OUT when there
@@ -866,82 +956,6 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
unsigned long gpa,
return ret;
 }
 
-/*
- * Provision a new page on HV side and copy over the contents
- * from secure memory using UV_PAGE_OUT uvcall.
- */
-static int kvmppc_svm_page_out(struct vm_area_struct *vma,
-   unsigned long start,
-   unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
-{
-   unsigned long src_pfn, dst_pfn = 0;
-   struct migrate_vma mig;
-   struct page *dpage, *spage;
-   struct kvmppc_uvmem_page_pvt *pvt;
-   unsigned long pfn;
-   int ret = U_SUCCESS;
-
-   memset(, 0, sizeof(mig));
-   mig.vma = vma;
-   mig.start = start;
-   mig.end = end;
-   mig.src = _pfn;
-   mig.dst = _pfn;
-   mig.src_owner = _uvmem_pgmap;
-
-   mutex_lock(>arch.uvmem_lock);
-   /* The requested page is already paged-out, nothing to do */
-   if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
-

[PATCH v2 0/2] Rework secure memslot dropping

2020-07-27 Thread Ram Pai

From: Laurent Dufour 

When doing memory hotplug on a secure VM, the secure pages are not well
cleaned from the secure device when dropping the memslot.  This silent
error, is then preventing the SVM to reboot properly after the following
sequence of commands are run in the Qemu monitor:

device_add pc-dimm,id=dimm1,memdev=mem1
device_del dimm1
device_add pc-dimm,id=dimm1,memdev=mem1

At reboot time, when the kernel is booting again and switching to the
secure mode, the page_in is failing for the pages in the memslot because
the cleanup was not done properly, because the memslot is flagged as
invalid during the hot unplug and thus the page fault mechanism is not
triggered.

To prevent that during the memslot dropping, instead of belonging on the
page fault mechanism to trigger the page out of the secured pages, it seems
simpler to directly call the function doing the page out. This way the
state of the memslot is not interfering on the page out process.

This series applies on top of the Ram's one titled:
"[v6 0/5] Migrate non-migrated pages of a SVM."


Changes since V2:
 - fix to vma boundary check in kvmppc_uvmem_drop_pages().

Changes since V1:
 - Rebase on top of Ram's V4 series
 - Address Bharata's comment to use mmap_read_*lock().

Laurent Dufour (2):
  KVM: PPC: Book3S HV: move kvmppc_svm_page_out up
  KVM: PPC: Book3S HV: rework secure mem slot dropping

 arch/powerpc/kvm/book3s_hv_uvmem.c | 218 +
 1 file changed, 125 insertions(+), 93 deletions(-)

-- 
1.8.3.1

[PATCH v6 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-07-27 Thread Ram Pai

From: Laurent Dufour 

When a memory slot is hot plugged to a SVM, PFNs associated with the
GFNs in that slot must be migrated to the secure-PFNs, aka device-PFNs.

Call kvmppc_uv_migrate_mem_slot() to accomplish this.
Disable page-merge for all pages in the memory slot.

Reviewed-by: Bharata B Rao 
Signed-off-by: Ram Pai 
[rearranged the code, and modified the commit log]
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h | 14 ++
 arch/powerpc/kvm/book3s_hv.c| 14 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 23 +++
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 9cb7d8b..0a63194 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,10 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 struct kvm *kvm, bool skip_page_out);
+int kvmppc_uvmem_memslot_create(struct kvm *kvm,
+   const struct kvm_memory_slot *new);
+void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
+   const struct kvm_memory_slot *old);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -82,5 +86,15 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
struct kvm *kvm, bool skip_page_out) { }
+
+static inline int  kvmppc_uvmem_memslot_create(struct kvm *kvm,
+   const struct kvm_memory_slot *new)
+{
+   return H_UNSUPPORTED;
+}
+
+static inline void  kvmppc_uvmem_memslot_delete(struct kvm *kvm,
+   const struct kvm_memory_slot *old) { }
+
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d331b46..a93bc65 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4515,16 +4515,14 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
 
switch (change) {
case KVM_MR_CREATE:
-   if (kvmppc_uvmem_slot_init(kvm, new))
-   return;
-   uv_register_mem_slot(kvm->arch.lpid,
-new->base_gfn << PAGE_SHIFT,
-new->npages * PAGE_SIZE,
-0, new->id);
+   /*
+* @TODO kvmppc_uvmem_memslot_create() can fail and
+* return error. Fix this.
+*/
+   kvmppc_uvmem_memslot_create(kvm, new);
break;
case KVM_MR_DELETE:
-   uv_unregister_mem_slot(kvm->arch.lpid, old->id);
-   kvmppc_uvmem_slot_free(kvm, old);
+   kvmppc_uvmem_memslot_delete(kvm, old);
break;
default:
/* TODO: Handle KVM_MR_MOVE */
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index a1664ae..5b917ea 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -418,7 +418,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
return ret;
 }
 
-static void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
+static void __kvmppc_uvmem_memslot_delete(struct kvm *kvm,
const struct kvm_memory_slot *memslot)
 {
uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
@@ -426,7 +426,7 @@ static void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
kvmppc_memslot_page_merge(kvm, memslot, true);
 }
 
-static int kvmppc_uvmem_memslot_create(struct kvm *kvm,
+static int __kvmppc_uvmem_memslot_create(struct kvm *kvm,
const struct kvm_memory_slot *memslot)
 {
int ret = H_PARAMETER;
@@ -478,7 +478,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
/* register the memslot */
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, slots) {
-   ret = kvmppc_uvmem_memslot_create(kvm, memslot);
+   ret = __kvmppc_uvmem_memslot_create(kvm, memslot);
if (ret)
break;
}
@@ -488,7 +488,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
kvm_for_each_memslot(m, slots) {
if (m == memslot)
break;
-   kvmppc_uvmem_memslot_delete(kvm, memslot);
+   __kvmppc_uvmem_memslot_delete(kvm, memslot);
}
}
 
@@ -1057,6 +1057,21 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned 
long gfn)
return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
 }
 
+int kvmppc_u

[PATCH v6 4/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.

2020-07-27 Thread Ram Pai

The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the
pages of the SVM before calling H_SVM_INIT_DONE. This causes a huge
delay in tranistioning the VM to SVM. The Ultravisor is only interested
in the pages that contain the kernel, initrd and other important data
structures. The rest contain throw-away content.

However if not all pages are requested by the Ultravisor, the Hypervisor
continues to consider the GFNs corresponding to the non-requested pages
as normal GFNs. This can lead to data-corruption and undefined behavior.

In H_SVM_INIT_DONE handler, move all the PFNs associated with the SVM's
GFNs to secure-PFNs. Skip the GFNs that are already Paged-in or Shared
or Paged-in followed by a Paged-out.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Bharata B Rao 
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 154 ++-
 2 files changed, 134 insertions(+), 22 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index a1c8c37..ba6b1bf 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -934,6 +934,8 @@ Return values
* H_UNSUPPORTED if called from the wrong context (e.g.
from an SVM or before an H_SVM_INIT_START
hypercall).
+   * H_STATE   if the hypervisor could not successfully
+transition the VM to Secure VM.
 
 Description
 ~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 1b2b029..a1664ae 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -93,6 +93,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
@@ -348,6 +349,41 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+/*
+ * starting from *gfn search for the next available GFN that is not yet
+ * transitioned to a secure GFN.  return the value of that GFN in *gfn.  If a
+ * GFN is found, return true, else return false
+ *
+ * Must be called with kvm->arch.uvmem_lock  held.
+ */
+static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot 
*memslot,
+   struct kvm *kvm, unsigned long *gfn)
+{
+   struct kvmppc_uvmem_slot *p;
+   bool ret = false;
+   unsigned long i;
+
+   list_for_each_entry(p, >arch.uvmem_pfns, list)
+   if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns)
+   break;
+   if (!p)
+   return ret;
+   /*
+* The code below assumes, one to one correspondence between
+* kvmppc_uvmem_slot and memslot.
+*/
+   for (i = *gfn; i < p->base_pfn + p->nr_pfns; i++) {
+   unsigned long index = i - p->base_pfn;
+
+   if (!(p->pfns[index] & KVMPPC_GFN_FLAG_MASK)) {
+   *gfn = i;
+   ret = true;
+   break;
+   }
+   }
+   return ret;
+}
+
 static int kvmppc_memslot_page_merge(struct kvm *kvm,
const struct kvm_memory_slot *memslot, bool merge)
 {
@@ -460,16 +496,6 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
return ret;
 }
 
-unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
-{
-   if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
-   return H_UNSUPPORTED;
-
-   kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
-   pr_info("LPID %d went secure\n", kvm->arch.lpid);
-   return H_SUCCESS;
-}
-
 /*
  * Drop device pages that we maintain for the secure guest
  *
@@ -588,12 +614,14 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
 }
 
 /*
- * Alloc a PFN from private device memory pool and copy page from normal
- * memory to secure memory using UV_PAGE_IN uvcall.
+ * Alloc a PFN from private device memory pool. If @pagein is true,
+ * copy page from normal memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
-  unsigned long end, unsigned long gpa, struct kvm *kvm,
-  unsigned long page_shift)
+static int kvmppc_svm_page_in(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long gpa, struct kvm *kvm,
+   unsigned long page_shift,
+   bool pagein)
 {

[PATCH v6 2/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START

2020-07-27 Thread Ram Pai

Page-merging of pages in memory-slots associated with a Secure VM,
is disabled in H_SVM_PAGE_IN handler.

This operation should have been done the much earlier; the moment the VM
is initiated for secure-transition. Delaying this operation, increases
the probability for those pages to acquire new references , making it
impossible to migrate those pages in H_SVM_PAGE_IN handler.

Disable page-migration in H_SVM_INIT_START handling.

Reviewed-by: Bharata B Rao 
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst |   1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 123 +--
 2 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index df136c8..a1c8c37 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -895,6 +895,7 @@ Return values
 One of the following values:
 
* H_SUCCESS  on success.
+* H_STATEif the VM is not in a position to switch to secure.
 
 Description
 ~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index e6f76bc..533b608 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -211,10 +211,79 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+static int kvmppc_memslot_page_merge(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot, bool merge)
+{
+   unsigned long gfn = memslot->base_gfn;
+   unsigned long end, start = gfn_to_hva(kvm, gfn);
+   int ret = 0;
+   struct vm_area_struct *vma;
+   int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
+
+   if (kvm_is_error_hva(start))
+   return H_STATE;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+
+   mmap_write_lock(kvm->mm);
+   do {
+   vma = find_vma_intersection(kvm->mm, start, end);
+   if (!vma) {
+   ret = H_STATE;
+   break;
+   }
+   ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+ merge_flag, >vm_flags);
+   if (ret) {
+   ret = H_STATE;
+   break;
+   }
+   start = vma->vm_end;
+   } while (end > vma->vm_end);
+
+   mmap_write_unlock(kvm->mm);
+   return ret;
+}
+
+static void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot)
+{
+   uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
+   kvmppc_uvmem_slot_free(kvm, memslot);
+   kvmppc_memslot_page_merge(kvm, memslot, true);
+}
+
+static int kvmppc_uvmem_memslot_create(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot)
+{
+   int ret = H_PARAMETER;
+
+   if (kvmppc_memslot_page_merge(kvm, memslot, false))
+   return ret;
+
+   if (kvmppc_uvmem_slot_init(kvm, memslot))
+   goto out1;
+
+   ret = uv_register_mem_slot(kvm->arch.lpid,
+  memslot->base_gfn << PAGE_SHIFT,
+  memslot->npages * PAGE_SIZE,
+  0, memslot->id);
+   if (ret < 0) {
+   ret = H_PARAMETER;
+   goto out;
+   }
+   return 0;
+out:
+   kvmppc_uvmem_slot_free(kvm, memslot);
+out1:
+   kvmppc_memslot_page_merge(kvm, memslot, true);
+   return ret;
+}
+
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 {
struct kvm_memslots *slots;
-   struct kvm_memory_slot *memslot;
+   struct kvm_memory_slot *memslot, *m;
int ret = H_SUCCESS;
int srcu_idx;
 
@@ -232,23 +301,24 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
return H_AUTHORITY;
 
srcu_idx = srcu_read_lock(>srcu);
+
+   /* register the memslot */
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, slots) {
-   if (kvmppc_uvmem_slot_init(kvm, memslot)) {
-   ret = H_PARAMETER;
-   goto out;
-   }
-   ret = uv_register_mem_slot(kvm->arch.lpid,
-  memslot->base_gfn << PAGE_SHIFT,
-  memslot->npages * PAGE_SIZE,
-  0, memslot->id);
-   if (ret < 0) {
-   kvmppc_uvmem_slot_free(kvm, memslot);
-   ret = H_PARAMETER;
-   goto out;
+   ret = kvmppc_uvmem_memslot_create(kvm, memslot);
+   if (ret)
+   break;
+   }
+
+   if (ret) {
+   slots = kvm_memslots(kvm);

[PATCH v6 3/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs

2020-07-27 Thread Ram Pai

  |
 | |   |  | |   |   |
 - --
 | |   |  | |   |   |
 | Normal  | Normal| Transient|Error|Error  |Normal |
 | |   |  | |   |   |
 | Secure  |   Error   | Error|Error|Error  |Normal |
 | |   |  | |   |   |
 |Transient|   N/A | Error|Secure   |Normal |Normal |
 



Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org

Reviewed-by: Bharata B Rao 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 191 +
 1 file changed, 172 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 533b608..1b2b029 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -98,7 +98,127 @@
 static unsigned long *kvmppc_uvmem_bitmap;
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
-#define KVMPPC_UVMEM_PFN   (1UL << 63)
+/*
+ * States of a GFN
+ * ---
+ * The GFN can be in one of the following states.
+ *
+ * (a) Secure - The GFN is secure. The GFN is associated with
+ * a Secure VM, the contents of the GFN is not accessible
+ * to the Hypervisor.  This GFN can be backed by a secure-PFN,
+ * or can be backed by a normal-PFN with contents encrypted.
+ * The former is true when the GFN is paged-in into the
+ * ultravisor. The latter is true when the GFN is paged-out
+ * of the ultravisor.
+ *
+ * (b) Shared - The GFN is shared. The GFN is associated with a
+ * a secure VM. The contents of the GFN is accessible to
+ * Hypervisor. This GFN is backed by a normal-PFN and its
+ * content is un-encrypted.
+ *
+ * (c) Normal - The GFN is a normal. The GFN is associated with
+ * a normal VM. The contents of the GFN is accesible to
+ * the Hypervisor. Its content is never encrypted.
+ *
+ * States of a VM.
+ * ---
+ *
+ * Normal VM:  A VM whose contents are always accessible to
+ * the hypervisor.  All its GFNs are normal-GFNs.
+ *
+ * Secure VM: A VM whose contents are not accessible to the
+ * hypervisor without the VM's consent.  Its GFNs are
+ * either Shared-GFN or Secure-GFNs.
+ *
+ * Transient VM: A Normal VM that is transitioning to secure VM.
+ * The transition starts on successful return of
+ * H_SVM_INIT_START, and ends on successful return
+ * of H_SVM_INIT_DONE. This transient VM, can have GFNs
+ * in any of the three states; i.e Secure-GFN, Shared-GFN,
+ * and Normal-GFN. The VM never executes in this state
+ * in supervisor-mode.
+ *
+ * Memory slot State.
+ * -
+ * The state of a memory slot mirrors the state of the
+ * VM the memory slot is associated with.
+ *
+ * VM State transition.
+ * 
+ *
+ *  A VM always starts in Normal Mode.
+ *
+ *  H_SVM_INIT_START moves the VM into transient state. During this
+ *  time the Ultravisor may request some of its GFNs to be shared or
+ *  secured. So its GFNs can be in one of the three GFN states.
+ *
+ *  H_SVM_INIT_DONE moves the VM entirely from transient state to
+ *  secure-state. At this point any left-over normal-GFNs are
+ *  transitioned to Secure-GFN.
+ *
+ *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
+ *  All its GFNs are moved to Normal-GFNs.
+ *
+ *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
+ *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
+ *  Note: The contents of the normal-GFN is undefined at this point.
+ *
+ * GFN state implementation:
+ * -
+ *
+ * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
+ * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
+ * set, and contains the value of the secure-PFN.
+ * It is associated with a normal-PFN; also called mem_pfn, when
+ * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
+ * The value of the normal-PFN is not tracked.
+ *
+ * Shared GFN is associated with a normal-PFN. Its pfn[] has
+ * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
+ * is not tracked.
+ *
+ * Normal GFN is associated with normal-PFN. Its pfn[] has
+ * no flag set. The value of the normal-PFN is not tracked.
+ *
+ * Life cycle of a GFN
+ * 
+ *
+ * --
+ * || Share  |  Unshare | SVM   |H_SVM_INI

[PATCH v6 0/5] Migrate non-migrated pages of a SVM.

2020-07-27 Thread Ram Pai

The time to switch a VM to Secure-VM, increases by the size of the VM.
A 100GB VM takes about 7minutes. This is unacceptable.  This linear
increase is caused by a suboptimal behavior by the Ultravisor and the
Hypervisor.  The Ultravisor unnecessarily migrates all the GFN of the
VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the remaining GFNs as
normal GFNs. This leads to memory corruption; manifested
consistently when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN.  But the hypervisor cannot handle any H_SVM_PAGE_IN
requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random errors
in the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a
SVM, and thus solves the problem.

Testing: Passed rigorous testing using various sized SVMs.

Changelog:

v6: . rearrangement of functions in book3s_hv_uvmem.c. No functional
change.
. decoupling this patch series from Laurent's memory-hotplug/unplug,
since the memhotplug/unplug/hotplug/reboot test is failing.

v5:  .  This patch series includes Laurent's fix for memory hotplug/unplug
  . drop pages first and then delete the memslot. Otherwise
the memslot does not get cleanly deleted, causing
problems during reboot.
  . recreatable through the following set of commands
 . device_add pc-dimm,id=dimm1,memdev=mem1
 . device_del dimm1
 . device_add pc-dimm,id=dimm1,memdev=mem1
Further incorporates comments from Bharata:
. fix for off-by-one while disabling migration.
. code-reorganized to maximize sharing in init_start path
and in memory-hotplug path
. locking adjustments in mass-page migration during H_SVM_INIT_DONE.
. improved recovery on error paths.
. additional comments in the code for better understanding.
. removed the retry-on-migration-failure code.
. re-added the initial patch that adjust some prototype to overcome
   a git problem, where it messes up the code context. Had
accidently dropped the patch in the last version.

v4:  .  Incorported Bharata's comments:
- Optimization -- replace write mmap semaphore with read mmap semphore.
- disable page-merge during memory hotplug.
- rearranged the patches. consolidated the page-migration-retry logic
in a single patch.

v3: . Optimized the page-migration retry-logic. 
. Relax and relinquish the cpu regularly while bulk migrating
the non-migrated pages. This issue was causing soft-lockups.
Fixed it.
. Added a new patch, to retry page-migration a couple of times
before returning H_BUSY in H_SVM_PAGE_IN. This issue was
seen a few times in a 24hour continuous reboot test of the SVMs.

v2: . fixed a bug observed by Laurent. The state of the GFN's associated
with Secure-VMs were not reset during memslot flush.
. Re-organized the code, for easier review.
. Better description of the patch series.

v1: . fixed a bug observed by Bharata. Pages that where paged-in and later
paged-out must also be skipped from migration during H_SVM_INIT_DONE.


Laurent Dufour (1):
  KVM: PPC: Book3S HV: migrate hot plugged memory

Ram Pai (4):
  KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
  KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs
to secure-GFNs.

 Documentation/powerpc/ultravisor.rst|   3 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  14 +
 arch/powerpc/kvm/book3s_hv.c|  14 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 498 +++-
 4 files changed, 437 insertions(+), 92 deletions(-)

-- 
1.8.3.1

[PATCH v6 1/5] KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c

2020-07-27 Thread Ram Pai

Without this fix, git is confused. It generates wrong
function context for code changes in subsequent patches.
Weird, but true.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 09d8119..e6f76bc 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -382,8 +382,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Alloc a PFN from private device memory pool and copy page from normal
  * memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int
-kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
+static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, unsigned long gpa, struct kvm *kvm,
   unsigned long page_shift, bool *downgrade)
 {
@@ -450,8 +449,8 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
  * to unmap the device page from QEMU's page tables.
  */
-static unsigned long
-kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
+static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
+   unsigned long page_shift)
 {
 
int ret = H_PARAMETER;
@@ -500,9 +499,9 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * H_PAGE_IN_SHARED flag makes the page shared which means that the same
  * memory in is visible from both UV and HV.
  */
-unsigned long
-kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
-unsigned long flags, unsigned long page_shift)
+unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
+   unsigned long flags,
+   unsigned long page_shift)
 {
bool downgrade = false;
unsigned long start, end;
@@ -559,10 +558,10 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Provision a new page on HV side and copy over the contents
  * from secure memory using UV_PAGE_OUT uvcall.
  */
-static int
-kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
-   unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
+static int kvmppc_svm_page_out(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long page_shift,
+   struct kvm *kvm, unsigned long gpa)
 {
unsigned long src_pfn, dst_pfn = 0;
struct migrate_vma mig;
-- 
1.8.3.1

[PATCH v5 7/7] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-23 Thread Ram Pai

From: Laurent Dufour 

When a secure memslot is dropped, all the pages backed in the secure
device (aka really backed by secure memory by the Ultravisor)
should be paged out to a normal page. Previously, this was
achieved by triggering the page fault mechanism which is calling
kvmppc_svm_page_out() on each pages.

This can't work when hot unplugging a memory slot because the memory
slot is flagged as invalid and gfn_to_pfn() is then not trying to access
the page, so the page fault mechanism is not triggered.

Since the final goal is to make a call to kvmppc_svm_page_out() it seems
simpler to call directly instead of triggering such a mechanism. This
way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
memslot.

Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
the call to __kvmppc_svm_page_out() is made.  As
__kvmppc_svm_page_out needs the vma pointer to migrate the pages,
the VMA is fetched in a lazy way, to not trigger find_vma() all
the time. In addition, the mmap_sem is held in read mode during
that time, not in write mode since the virual memory layout is not
impacted, and kvm->arch.uvmem_lock prevents concurrent operation
on the secure device.

Cc: Ram Pai 
Cc: Bharata B Rao 
Cc: Paul Mackerras 
Signed-off-by: Ram Pai 
[modified the changelog description]
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++
 1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index c772e92..daffa6e 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -632,35 +632,55 @@ static inline int kvmppc_svm_page_out(struct 
vm_area_struct *vma,
  * fault on them, do fault time migration to replace the device PTEs in
  * QEMU page table with normal PTEs from newly allocated pages.
  */
-void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
 struct kvm *kvm, bool skip_page_out)
 {
int i;
struct kvmppc_uvmem_page_pvt *pvt;
-   unsigned long pfn, uvmem_pfn;
-   unsigned long gfn = free->base_gfn;
+   struct page *uvmem_page;
+   struct vm_area_struct *vma = NULL;
+   unsigned long uvmem_pfn, gfn;
+   unsigned long addr, end;
+
+   mmap_read_lock(kvm->mm);
+
+   addr = slot->userspace_addr;
+   end = addr + (slot->npages * PAGE_SIZE);
 
-   for (i = free->npages; i; --i, ++gfn) {
-   struct page *uvmem_page;
+   gfn = slot->base_gfn;
+   for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
+
+   /* Fetch the VMA if addr is not in the latest fetched one */
+   if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
+   vma = find_vma_intersection(kvm->mm, addr, end);
+   if (!vma ||
+   vma->vm_start > addr || vma->vm_end < end) {
+   pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
+   break;
+   }
+   }
 
mutex_lock(>arch.uvmem_lock);
-   if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, _pfn)) {
+
+   if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, _pfn)) {
+   uvmem_page = pfn_to_page(uvmem_pfn);
+   pvt = uvmem_page->zone_device_data;
+   pvt->skip_page_out = skip_page_out;
+   pvt->remove_gfn = true;
+
+   if (__kvmppc_svm_page_out(vma, addr, addr + PAGE_SIZE,
+ PAGE_SHIFT, kvm, pvt->gpa))
+   pr_err("Can't page out gpa:0x%lx addr:0x%lx\n",
+  pvt->gpa, addr);
+   } else {
+   /* Remove the shared flag if any */
kvmppc_gfn_remove(gfn, kvm);
-   mutex_unlock(>arch.uvmem_lock);
-   continue;
}
 
-   uvmem_page = pfn_to_page(uvmem_pfn);
-   pvt = uvmem_page->zone_device_data;
-   pvt->skip_page_out = skip_page_out;
-   pvt->remove_gfn = true;
mutex_unlock(>arch.uvmem_lock);
-
-   pfn = gfn_to_pfn(kvm, gfn);
-   if (is_error_noslot_pfn(pfn))
-   continue;
-   kvm_release_pfn_clean(pfn);
}
+
+   mmap_read_unlock(kvm->mm);
 }
 
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
-- 
1.8.3.1

[PATCH v5 6/7] KVM: PPC: Book3S HV: move kvmppc_svm_page_out up

2020-07-23 Thread Ram Pai

From: Laurent Dufour 

kvmppc_svm_page_out() will need to be called by kvmppc_uvmem_drop_pages()
so move it upper in this file.

Furthermore it will be interesting to call this function when already
holding the kvm->arch.uvmem_lock, so prefix the original function with __
and remove the locking in it, and introduce a wrapper which call that
function with the lock held.

There is no functional change.

Cc: Ram Pai 
Cc: Bharata B Rao 
Cc: Paul Mackerras 
Signed-off-by: Ram Pai 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 166 -
 1 file changed, 90 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index fd9ff6e..c772e92 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -535,6 +535,96 @@ unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 }
 
 /*
+ * Provision a new page on HV side and copy over the contents
+ * from secure memory using UV_PAGE_OUT uvcall.
+ * Caller must held kvm->arch.uvmem_lock.
+ */
+static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long page_shift,
+   struct kvm *kvm, unsigned long gpa)
+{
+   unsigned long src_pfn, dst_pfn = 0;
+   struct migrate_vma mig;
+   struct page *dpage, *spage;
+   struct kvmppc_uvmem_page_pvt *pvt;
+   unsigned long pfn;
+   int ret = U_SUCCESS;
+
+   memset(, 0, sizeof(mig));
+   mig.vma = vma;
+   mig.start = start;
+   mig.end = end;
+   mig.src = _pfn;
+   mig.dst = _pfn;
+   mig.src_owner = _uvmem_pgmap;
+
+   /* The requested page is already paged-out, nothing to do */
+   if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
+   return ret;
+
+   ret = migrate_vma_setup();
+   if (ret)
+   return -1;
+
+   spage = migrate_pfn_to_page(*mig.src);
+   if (!spage || !(*mig.src & MIGRATE_PFN_MIGRATE))
+   goto out_finalize;
+
+   if (!is_zone_device_page(spage))
+   goto out_finalize;
+
+   dpage = alloc_page_vma(GFP_HIGHUSER, vma, start);
+   if (!dpage) {
+   ret = -1;
+   goto out_finalize;
+   }
+
+   lock_page(dpage);
+   pvt = spage->zone_device_data;
+   pfn = page_to_pfn(dpage);
+
+   /*
+* This function is used in two cases:
+* - When HV touches a secure page, for which we do UV_PAGE_OUT
+* - When a secure page is converted to shared page, we *get*
+*   the page to essentially unmap the device page. In this
+*   case we skip page-out.
+*/
+   if (!pvt->skip_page_out)
+   ret = uv_page_out(kvm->arch.lpid, pfn << page_shift,
+ gpa, 0, page_shift);
+
+   if (ret == U_SUCCESS)
+   *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
+   else {
+   unlock_page(dpage);
+   __free_page(dpage);
+   goto out_finalize;
+   }
+
+   migrate_vma_pages();
+
+out_finalize:
+   migrate_vma_finalize();
+   return ret;
+}
+
+static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long page_shift,
+ struct kvm *kvm, unsigned long gpa)
+{
+   int ret;
+
+   mutex_lock(>arch.uvmem_lock);
+   ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
+   mutex_unlock(>arch.uvmem_lock);
+
+   return ret;
+}
+
+/*
  * Drop device pages that we maintain for the secure guest
  *
  * We first mark the pages to be skipped from UV_PAGE_OUT when there
@@ -866,82 +956,6 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
unsigned long gpa,
return ret;
 }
 
-/*
- * Provision a new page on HV side and copy over the contents
- * from secure memory using UV_PAGE_OUT uvcall.
- */
-static int kvmppc_svm_page_out(struct vm_area_struct *vma,
-   unsigned long start,
-   unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
-{
-   unsigned long src_pfn, dst_pfn = 0;
-   struct migrate_vma mig;
-   struct page *dpage, *spage;
-   struct kvmppc_uvmem_page_pvt *pvt;
-   unsigned long pfn;
-   int ret = U_SUCCESS;
-
-   memset(, 0, sizeof(mig));
-   mig.vma = vma;
-   mig.start = start;
-   mig.end = end;
-   mig.src = _pfn;
-   mig.dst = _pfn;
-   mig.src_owner = _uvmem_pgmap;
-
-   mutex_lock(>arch.uvmem_lock);
-   /* The requested page is already paged-out, nothing to do */
-   if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
-   goto out;
-
-

[PATCH v5 5/7] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-07-23 Thread Ram Pai

From: Laurent Dufour 

When a memory slot is hot plugged to a SVM, PFNs associated with the
GFNs in that slot must be migrated to the secure-PFNs, aka device-PFNs.

Call kvmppc_uv_migrate_mem_slot() to accomplish this.
Disable page-merge for all pages in the memory slot.

Signed-off-by: Ram Pai 
[rearranged the code, and modified the commit log]
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h | 14 ++
 arch/powerpc/kvm/book3s_hv.c| 10 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 23 +++
 3 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index f229ab5..59c17ca 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -25,6 +25,10 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot 
*free,
 struct kvm *kvm, bool skip_page_out);
 int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
+int kvmppc_uvmem_memslot_create(struct kvm *kvm,
+   const struct kvm_memory_slot *new);
+void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
+   const struct kvm_memory_slot *old);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -84,5 +88,15 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
struct kvm *kvm, bool skip_page_out) { }
+
+static inline int  kvmppc_uvmem_memslot_create(struct kvm *kvm,
+   const struct kvm_memory_slot *new)
+{
+   return H_UNSUPPORTED;
+}
+
+static inline void  kvmppc_uvmem_memslot_delete(struct kvm *kvm,
+   const struct kvm_memory_slot *old) { }
+
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d331b46..b1485ca 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4515,16 +4515,10 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
 
switch (change) {
case KVM_MR_CREATE:
-   if (kvmppc_uvmem_slot_init(kvm, new))
-   return;
-   uv_register_mem_slot(kvm->arch.lpid,
-new->base_gfn << PAGE_SHIFT,
-new->npages * PAGE_SIZE,
-0, new->id);
+   kvmppc_uvmem_memslot_create(kvm, new);
break;
case KVM_MR_DELETE:
-   uv_unregister_mem_slot(kvm->arch.lpid, old->id);
-   kvmppc_uvmem_slot_free(kvm, old);
+   kvmppc_uvmem_memslot_delete(kvm, old);
break;
default:
/* TODO: Handle KVM_MR_MOVE */
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 43b3566..fd9ff6e 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -418,7 +418,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
return ret;
 }
 
-static void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
+static void __kvmppc_uvmem_memslot_delete(struct kvm *kvm,
const struct kvm_memory_slot *memslot)
 {
uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
@@ -426,7 +426,7 @@ static void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
kvmppc_memslot_page_merge(kvm, memslot, true);
 }
 
-static int kvmppc_uvmem_memslot_create(struct kvm *kvm,
+static int __kvmppc_uvmem_memslot_create(struct kvm *kvm,
const struct kvm_memory_slot *memslot)
 {
int ret = H_PARAMETER;
@@ -478,7 +478,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
/* register the memslot */
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, slots) {
-   ret = kvmppc_uvmem_memslot_create(kvm, memslot);
+   ret = __kvmppc_uvmem_memslot_create(kvm, memslot);
if (ret)
break;
}
@@ -488,7 +488,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
kvm_for_each_memslot(m, slots) {
if (m == memslot)
break;
-   kvmppc_uvmem_memslot_delete(kvm, memslot);
+   __kvmppc_uvmem_memslot_delete(kvm, memslot);
}
}
 
@@ -1057,6 +1057,21 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned 
long gfn)
return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
 }
 
+int kvmppc_uvmem_memslot_create(struct kvm *kvm, const struct kvm_memory_slot 
*new)
+{
+   int ret = __kvmppc_uvmem_memslot_create(kvm, new);
+
+   if (!ret)
+   ret = kvmpp

[PATCH v5 4/7] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.

2020-07-23 Thread Ram Pai

The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the
pages of the SVM before calling H_SVM_INIT_DONE. This causes a huge
delay in tranistioning the VM to SVM. The Ultravisor is only interested
in the pages that contain the kernel, initrd and other important data
structures. The rest contain throw-away content.

However if not all pages are requested by the Ultravisor, the Hypervisor
continues to consider the GFNs corresponding to the non-requested pages
as normal GFNs. This can lead to data-corruption and undefined behavior.

In H_SVM_INIT_DONE handler, move all the PFNs associated with the SVM's
GFNs to secure-PFNs. Skip the GFNs that are already Paged-in or Shared
or Paged-in followed by a Paged-out.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst|   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 136 +---
 3 files changed, 127 insertions(+), 13 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index a1c8c37..ba6b1bf 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -934,6 +934,8 @@ Return values
* H_UNSUPPORTED if called from the wrong context (e.g.
from an SVM or before an H_SVM_INIT_START
hypercall).
+   * H_STATE   if the hypervisor could not successfully
+transition the VM to Secure VM.
 
 Description
 ~~~
diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 9cb7d8b..f229ab5 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 struct kvm *kvm, bool skip_page_out);
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 1b2b029..43b3566 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -93,6 +93,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
@@ -348,6 +349,41 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+/*
+ * starting from *gfn search for the next available GFN that is not yet
+ * transitioned to a secure GFN.  return the value of that GFN in *gfn.  If a
+ * GFN is found, return true, else return false
+ *
+ * Must be called with kvm->arch.uvmem_lock  held.
+ */
+static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot 
*memslot,
+   struct kvm *kvm, unsigned long *gfn)
+{
+   struct kvmppc_uvmem_slot *p;
+   bool ret = false;
+   unsigned long i;
+
+   list_for_each_entry(p, >arch.uvmem_pfns, list)
+   if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns)
+   break;
+   if (!p)
+   return ret;
+   /*
+* The code below assumes, one to one correspondence between
+* kvmppc_uvmem_slot and memslot.
+*/
+   for (i = *gfn; i < p->base_pfn + p->nr_pfns; i++) {
+   unsigned long index = i - p->base_pfn;
+
+   if (!(p->pfns[index] & KVMPPC_GFN_FLAG_MASK)) {
+   *gfn = i;
+   ret = true;
+   break;
+   }
+   }
+   return ret;
+}
+
 static int kvmppc_memslot_page_merge(struct kvm *kvm,
const struct kvm_memory_slot *memslot, bool merge)
 {
@@ -462,12 +498,40 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 {
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int srcu_idx;
+   long ret = H_SUCCESS;
+
if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
return H_UNSUPPORTED;
 
+   /* migrate any unmoved normal pfn to device pfns*/
+   srcu_idx = srcu_read_lock(>srcu);
+   slots = kvm_memslots(kvm);
+   kvm_for_each_memslot(memslot, slots) {
+   ret = kvmppc_

[PATCH v5 3/7] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs

2020-07-23 Thread Ram Pai

  |
 | |   |  | |   |   |
 - --
 | |   |  | |   |   |
 | Normal  | Normal| Transient|Error|Error  |Normal |
 | |   |  | |   |   |
 | Secure  |   Error   | Error|Error|Error  |Normal |
 | |   |  | |   |   |
 |Transient|   N/A | Error|Secure   |Normal |Normal |
 



Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org

Reviewed-by: Bharata B Rao 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 191 +
 1 file changed, 172 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 533b608..1b2b029 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -98,7 +98,127 @@
 static unsigned long *kvmppc_uvmem_bitmap;
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
-#define KVMPPC_UVMEM_PFN   (1UL << 63)
+/*
+ * States of a GFN
+ * ---
+ * The GFN can be in one of the following states.
+ *
+ * (a) Secure - The GFN is secure. The GFN is associated with
+ * a Secure VM, the contents of the GFN is not accessible
+ * to the Hypervisor.  This GFN can be backed by a secure-PFN,
+ * or can be backed by a normal-PFN with contents encrypted.
+ * The former is true when the GFN is paged-in into the
+ * ultravisor. The latter is true when the GFN is paged-out
+ * of the ultravisor.
+ *
+ * (b) Shared - The GFN is shared. The GFN is associated with a
+ * a secure VM. The contents of the GFN is accessible to
+ * Hypervisor. This GFN is backed by a normal-PFN and its
+ * content is un-encrypted.
+ *
+ * (c) Normal - The GFN is a normal. The GFN is associated with
+ * a normal VM. The contents of the GFN is accesible to
+ * the Hypervisor. Its content is never encrypted.
+ *
+ * States of a VM.
+ * ---
+ *
+ * Normal VM:  A VM whose contents are always accessible to
+ * the hypervisor.  All its GFNs are normal-GFNs.
+ *
+ * Secure VM: A VM whose contents are not accessible to the
+ * hypervisor without the VM's consent.  Its GFNs are
+ * either Shared-GFN or Secure-GFNs.
+ *
+ * Transient VM: A Normal VM that is transitioning to secure VM.
+ * The transition starts on successful return of
+ * H_SVM_INIT_START, and ends on successful return
+ * of H_SVM_INIT_DONE. This transient VM, can have GFNs
+ * in any of the three states; i.e Secure-GFN, Shared-GFN,
+ * and Normal-GFN. The VM never executes in this state
+ * in supervisor-mode.
+ *
+ * Memory slot State.
+ * -
+ * The state of a memory slot mirrors the state of the
+ * VM the memory slot is associated with.
+ *
+ * VM State transition.
+ * 
+ *
+ *  A VM always starts in Normal Mode.
+ *
+ *  H_SVM_INIT_START moves the VM into transient state. During this
+ *  time the Ultravisor may request some of its GFNs to be shared or
+ *  secured. So its GFNs can be in one of the three GFN states.
+ *
+ *  H_SVM_INIT_DONE moves the VM entirely from transient state to
+ *  secure-state. At this point any left-over normal-GFNs are
+ *  transitioned to Secure-GFN.
+ *
+ *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
+ *  All its GFNs are moved to Normal-GFNs.
+ *
+ *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
+ *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
+ *  Note: The contents of the normal-GFN is undefined at this point.
+ *
+ * GFN state implementation:
+ * -
+ *
+ * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
+ * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
+ * set, and contains the value of the secure-PFN.
+ * It is associated with a normal-PFN; also called mem_pfn, when
+ * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
+ * The value of the normal-PFN is not tracked.
+ *
+ * Shared GFN is associated with a normal-PFN. Its pfn[] has
+ * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
+ * is not tracked.
+ *
+ * Normal GFN is associated with normal-PFN. Its pfn[] has
+ * no flag set. The value of the normal-PFN is not tracked.
+ *
+ * Life cycle of a GFN
+ * 
+ *
+ * --
+ * || Share  |  Unshare | SVM   |H_SVM_INI

[PATCH v5 2/7] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START

2020-07-23 Thread Ram Pai

Page-merging of pages in memory-slots associated with a Secure VM,
is disabled in H_SVM_PAGE_IN handler.

This operation should have been done the much earlier; the moment the VM
is initiated for secure-transition. Delaying this operation, increases
the probability for those pages to acquire new references , making it
impossible to migrate those pages in H_SVM_PAGE_IN handler.

Disable page-migration in H_SVM_INIT_START handling.

Reviewed-by: Bharata B Rao 
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst |   1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 123 +--
 2 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index df136c8..a1c8c37 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -895,6 +895,7 @@ Return values
 One of the following values:
 
* H_SUCCESS  on success.
+* H_STATEif the VM is not in a position to switch to secure.
 
 Description
 ~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index e6f76bc..533b608 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -211,10 +211,79 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+static int kvmppc_memslot_page_merge(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot, bool merge)
+{
+   unsigned long gfn = memslot->base_gfn;
+   unsigned long end, start = gfn_to_hva(kvm, gfn);
+   int ret = 0;
+   struct vm_area_struct *vma;
+   int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
+
+   if (kvm_is_error_hva(start))
+   return H_STATE;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+
+   mmap_write_lock(kvm->mm);
+   do {
+   vma = find_vma_intersection(kvm->mm, start, end);
+   if (!vma) {
+   ret = H_STATE;
+   break;
+   }
+   ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+ merge_flag, >vm_flags);
+   if (ret) {
+   ret = H_STATE;
+   break;
+   }
+   start = vma->vm_end;
+   } while (end > vma->vm_end);
+
+   mmap_write_unlock(kvm->mm);
+   return ret;
+}
+
+static void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot)
+{
+   uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
+   kvmppc_uvmem_slot_free(kvm, memslot);
+   kvmppc_memslot_page_merge(kvm, memslot, true);
+}
+
+static int kvmppc_uvmem_memslot_create(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot)
+{
+   int ret = H_PARAMETER;
+
+   if (kvmppc_memslot_page_merge(kvm, memslot, false))
+   return ret;
+
+   if (kvmppc_uvmem_slot_init(kvm, memslot))
+   goto out1;
+
+   ret = uv_register_mem_slot(kvm->arch.lpid,
+  memslot->base_gfn << PAGE_SHIFT,
+  memslot->npages * PAGE_SIZE,
+  0, memslot->id);
+   if (ret < 0) {
+   ret = H_PARAMETER;
+   goto out;
+   }
+   return 0;
+out:
+   kvmppc_uvmem_slot_free(kvm, memslot);
+out1:
+   kvmppc_memslot_page_merge(kvm, memslot, true);
+   return ret;
+}
+
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 {
struct kvm_memslots *slots;
-   struct kvm_memory_slot *memslot;
+   struct kvm_memory_slot *memslot, *m;
int ret = H_SUCCESS;
int srcu_idx;
 
@@ -232,23 +301,24 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
return H_AUTHORITY;
 
srcu_idx = srcu_read_lock(>srcu);
+
+   /* register the memslot */
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, slots) {
-   if (kvmppc_uvmem_slot_init(kvm, memslot)) {
-   ret = H_PARAMETER;
-   goto out;
-   }
-   ret = uv_register_mem_slot(kvm->arch.lpid,
-  memslot->base_gfn << PAGE_SHIFT,
-  memslot->npages * PAGE_SIZE,
-  0, memslot->id);
-   if (ret < 0) {
-   kvmppc_uvmem_slot_free(kvm, memslot);
-   ret = H_PARAMETER;
-   goto out;
+   ret = kvmppc_uvmem_memslot_create(kvm, memslot);
+   if (ret)
+   break;
+   }
+
+   if (ret) {
+   slots = kvm_memslots(kvm);

[PATCH v5 1/7] KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c

2020-07-23 Thread Ram Pai

Without this fix, git is confused. It generates wrong
function context for code changes in subsequent patches.
Weird, but true.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 09d8119..e6f76bc 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -382,8 +382,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Alloc a PFN from private device memory pool and copy page from normal
  * memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int
-kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
+static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, unsigned long gpa, struct kvm *kvm,
   unsigned long page_shift, bool *downgrade)
 {
@@ -450,8 +449,8 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
  * to unmap the device page from QEMU's page tables.
  */
-static unsigned long
-kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
+static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
+   unsigned long page_shift)
 {
 
int ret = H_PARAMETER;
@@ -500,9 +499,9 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * H_PAGE_IN_SHARED flag makes the page shared which means that the same
  * memory in is visible from both UV and HV.
  */
-unsigned long
-kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
-unsigned long flags, unsigned long page_shift)
+unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
+   unsigned long flags,
+   unsigned long page_shift)
 {
bool downgrade = false;
unsigned long start, end;
@@ -559,10 +558,10 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Provision a new page on HV side and copy over the contents
  * from secure memory using UV_PAGE_OUT uvcall.
  */
-static int
-kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
-   unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
+static int kvmppc_svm_page_out(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long page_shift,
+   struct kvm *kvm, unsigned long gpa)
 {
unsigned long src_pfn, dst_pfn = 0;
struct migrate_vma mig;
-- 
1.8.3.1

[PATCH v5 0/7] Migrate non-migrated pages of a SVM.

2020-07-23 Thread Ram Pai

The time to switch a VM to Secure-VM, increases by the size of the VM.
A 100GB VM takes about 7minutes. This is unacceptable.  This linear
increase is caused by a suboptimal behavior by the Ultravisor and the
Hypervisor.  The Ultravisor unnecessarily migrates all the GFN of the
VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the remaining GFNs as
normal GFNs. This leads to memory corruption; manifested
consistently when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN.  But the hypervisor cannot handle any H_SVM_PAGE_IN
requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random errors
in the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a
SVM, and thus solves the problem.

Testing: Passed rigorous testing using various sized SVMs.

Changelog:

v5:  .  This patch series includes Laurent's fix for memory hotplug/unplug
  . drop pages first and then delete the memslot. Otherwise
the memslot does not get cleanly deleted, causing
problems during reboot.
  . recreatable through the following set of commands
 . device_add pc-dimm,id=dimm1,memdev=mem1
 . device_del dimm1
 . device_add pc-dimm,id=dimm1,memdev=mem1
Further incorporates comments from Bharata:
. fix for off-by-one while disabling migration.
. code-reorganized to maximize sharing in init_start path
and in memory-hotplug path
. locking adjustments in mass-page migration during H_SVM_INIT_DONE.
. improved recovery on error paths.
. additional comments in the code for better understanding.
. removed the retry-on-migration-failure code.
. re-added the initial patch that adjust some prototype to overcome
   a git problem, where it messes up the code context. Had
accidently dropped the patch in the last version.

v4:  .  Incorported Bharata's comments:
- Optimization -- replace write mmap semaphore with read mmap semphore.
- disable page-merge during memory hotplug.
- rearranged the patches. consolidated the page-migration-retry logic
in a single patch.

v3: . Optimized the page-migration retry-logic. 
. Relax and relinquish the cpu regularly while bulk migrating
the non-migrated pages. This issue was causing soft-lockups.
Fixed it.
. Added a new patch, to retry page-migration a couple of times
before returning H_BUSY in H_SVM_PAGE_IN. This issue was
seen a few times in a 24hour continuous reboot test of the SVMs.

v2: . fixed a bug observed by Laurent. The state of the GFN's associated
with Secure-VMs were not reset during memslot flush.
. Re-organized the code, for easier review.
. Better description of the patch series.

v1: . fixed a bug observed by Bharata. Pages that where paged-in and later
paged-out must also be skipped from migration during H_SVM_INIT_DONE.

Laurent Dufour (3):
  KVM: PPC: Book3S HV: migrate hot plugged memory
  KVM: PPC: Book3S HV: move kvmppc_svm_page_out up
  KVM: PPC: Book3S HV: rework secure mem slot dropping

Ram Pai (4):
  KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
  KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs
to secure-GFNs.

 Documentation/powerpc/ultravisor.rst|   3 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  16 +
 arch/powerpc/kvm/book3s_hv.c|  10 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 690 +---
 4 files changed, 548 insertions(+), 171 deletions(-)

-- 
1.8.3.1

Re: [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out

2020-07-23 Thread Ram Pai

I am dropping this patch based on our conversation, where we agreed, we
need to rootcause the migration failure.

On Thu, Jul 23, 2020 at 11:43:44AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:26AM -0700, Ram Pai wrote:
> > @@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
> > unsigned long gpa,
> > struct vm_area_struct *vma;
> > int srcu_idx;
> > unsigned long gfn = gpa >> page_shift;
> > -   int ret;
> > +   int ret, repeat_count = REPEAT_COUNT;
> >  
> > if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
> > return H_UNSUPPORTED;
> > @@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
> > unsigned long gpa,
> > if (flags & H_PAGE_IN_SHARED)
> > return kvmppc_share_page(kvm, gpa, page_shift);
> >  
> > -   ret = H_PARAMETER;
> > srcu_idx = srcu_read_lock(>srcu);
> > -   mmap_read_lock(kvm->mm);
> >  
> > -   start = gfn_to_hva(kvm, gfn);
> > -   if (kvm_is_error_hva(start))
> > -   goto out;
> > -
> > -   mutex_lock(>arch.uvmem_lock);
> > /* Fail the page-in request of an already paged-in page */
> > -   if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
> > -   goto out_unlock;
> > +   mutex_lock(>arch.uvmem_lock);
> > +   ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
> > +   mutex_unlock(>arch.uvmem_lock);
> 
> Same comment as for the prev patch. I don't think you can release
> the lock here.
> 
> > +   if (ret) {
> > +   srcu_read_unlock(>srcu, srcu_idx);
> > +   return H_PARAMETER;
> > +   }
> >  
> > -   end = start + (1UL << page_shift);
> > -   vma = find_vma_intersection(kvm->mm, start, end);
> > -   if (!vma || vma->vm_start > start || vma->vm_end < end)
> > -   goto out_unlock;
> > +   do {
> > +   ret = H_PARAMETER;
> > +   mmap_read_lock(kvm->mm);
> >  
> > -   if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
> > -   true))
> > -   goto out_unlock;
> > +   start = gfn_to_hva(kvm, gfn);
> > +   if (kvm_is_error_hva(start)) {
> > +   mmap_read_unlock(kvm->mm);
> > +   break;
> > +   }
> >  
> > -   ret = H_SUCCESS;
> > +   end = start + (1UL << page_shift);
> > +   vma = find_vma_intersection(kvm->mm, start, end);
> > +   if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +   mmap_read_unlock(kvm->mm);
> > +   break;
> > +   }
> > +
> > +   mutex_lock(>arch.uvmem_lock);
> > +   ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, 
> > page_shift, true);
> > +   mutex_unlock(>arch.uvmem_lock);
> > +
> > +   mmap_read_unlock(kvm->mm);
> > +   } while (ret == -2 && repeat_count--);
> > +
> > +   if (ret == -2)
> > +   ret = H_BUSY;
> >  
> > -out_unlock:
> > -   mutex_unlock(>arch.uvmem_lock);
> > -out:
> > -   mmap_read_unlock(kvm->mm);
> > srcu_read_unlock(>srcu, srcu_idx);
> > return ret;
> >  }
> > -- 
> > 1.8.3.1

-- 
Ram Pai

Re: [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.

2020-07-23 Thread Ram Pai

On Thu, Jul 23, 2020 at 11:40:37AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:25AM -0700, Ram Pai wrote:
> >  
> > +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> > +   const struct kvm_memory_slot *memslot)
> 
> Don't see any callers for this outside of this file, so why not static?
> 
> > +{
> > +   unsigned long gfn = memslot->base_gfn;
> > +   struct vm_area_struct *vma;
> > +   unsigned long start, end;
> > +   int ret = 0;
> > +
> > +   while (kvmppc_next_nontransitioned_gfn(memslot, kvm, )) {
> 
> So you checked the state of gfn under uvmem_lock above, but release
> it too.
> 
> > +
> > +   mmap_read_lock(kvm->mm);
> > +   start = gfn_to_hva(kvm, gfn);
> > +   if (kvm_is_error_hva(start)) {
> > +   ret = H_STATE;
> > +   goto next;
> > +   }
> > +
> > +   end = start + (1UL << PAGE_SHIFT);
> > +   vma = find_vma_intersection(kvm->mm, start, end);
> > +   if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +   ret = H_STATE;
> > +   goto next;
> > +   }
> > +
> > +   mutex_lock(>arch.uvmem_lock);
> > +   ret = kvmppc_svm_migrate_page(vma, start, end,
> > +   (gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> 
> What is the guarantee that the gfn is in the same earlier state when you do
> do migration here?

Are you worried about the case, where someother thread will sneak-in and
migrate the GFN, and this migration request will become a duplicate one?

That is theortically possible, though practically improbable. This
transition is attempted only when there is one vcpu active in the VM.

However, may be, we should not bake-in that assumption in this code.
Will remove that assumption.

RP

Re: [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs

2020-07-23 Thread Ram Pai

On Thu, Jul 23, 2020 at 10:18:30AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:24AM -0700, Ram Pai wrote:
> > pvt->gpa = gpa;
..snip..
> > pvt->kvm = kvm;
> > @@ -524,6 +663,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, 
> > unsigned long gpa,
> > uvmem_page = pfn_to_page(uvmem_pfn);
> > pvt = uvmem_page->zone_device_data;
> > pvt->skip_page_out = true;
> > +   pvt->remove_gfn = false;
> > }
> >  
> >  retry:
> > @@ -537,12 +677,16 @@ static unsigned long kvmppc_share_page(struct kvm 
> > *kvm, unsigned long gpa,
> > uvmem_page = pfn_to_page(uvmem_pfn);
> > pvt = uvmem_page->zone_device_data;
> > pvt->skip_page_out = true;
> > +   pvt->remove_gfn = false;
> 
> This is the case of making an already secure page as shared page.
> A comment here as to why remove_gfn is set to false here will help.
> 
> Also isn't it by default false? Is there a situation where it starts
> out by default false, becomes true later and you are required to
> explicitly mark it false here?

It is by default false. And will be true when the GFN is
released/invalidated through kvmppc_uvmem_drop_pages().

It is marked false explicitly here, just to be safe, and protect
against any implicit changes.

> 
> Otherwise, Reviewed-by: Bharata B Rao 
> 
Thanks for the review.

RP

RE: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register

2020-07-22 Thread Ram Pai

On Wed, Jul 22, 2020 at 12:06:06PM +1000, Michael Ellerman wrote:
> Ram Pai  writes:
> > An instruction accessing a mmio address, generates a HDSI fault.  This 
> > fault is
> > appropriately handled by the Hypervisor.  However in the case of secureVMs, 
> > the
> > fault is delivered to the ultravisor.
> >
> > Unfortunately the Ultravisor has no correct-way to fetch the faulting
> > instruction. The PEF architecture does not allow Ultravisor to enable MMU
> > translation. Walking the two level page table to read the instruction can 
> > race
> > with other vcpus modifying the SVM's process scoped page table.
> 
> You're trying to read the guest's kernel text IIUC, that mapping should
> be stable. Possibly permissions on it could change over time, but the
> virtual -> real mapping should not.

Actually the code does not capture the address of the instruction in the
sprg0 register. It captures the instruction itself. So should the mapping
matter?

> 
> > This problem can be correctly solved with some help from the kernel.
> >
> > Capture the faulting instruction in SPRG0 register, before executing the
> > faulting instruction. This enables the ultravisor to easily procure the
> > faulting instruction and emulate it.
> 
> This is not something I'm going to merge. Sorry.

Ok. Will consider other approaches.

RP

RE: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register

2020-07-22 Thread Ram Pai

On Wed, Jul 22, 2020 at 12:42:05AM -0700, Ram Pai wrote:
> On Wed, Jul 22, 2020 at 03:02:32PM +1000, Paul Mackerras wrote:
> > On Thu, Jul 16, 2020 at 01:32:13AM -0700, Ram Pai wrote:
> > > An instruction accessing a mmio address, generates a HDSI fault.  This 
> > > fault is
> > > appropriately handled by the Hypervisor.  However in the case of 
> > > secureVMs, the
> > > fault is delivered to the ultravisor.
> > > 
> > > Unfortunately the Ultravisor has no correct-way to fetch the faulting
> > > instruction. The PEF architecture does not allow Ultravisor to enable MMU
> > > translation. Walking the two level page table to read the instruction can 
> > > race
> > > with other vcpus modifying the SVM's process scoped page table.
> > > 
> > > This problem can be correctly solved with some help from the kernel.
> > > 
> > > Capture the faulting instruction in SPRG0 register, before executing the
> > > faulting instruction. This enables the ultravisor to easily procure the
> > > faulting instruction and emulate it.
> > 
> > Just a comment on the approach of putting the instruction in SPRG0:
> > these I/O accessors can be used in interrupt routines, which means
> > that if these accessors are ever used with interrupts enabled, there
> > is the possibility of an external interrupt occurring between the
> > instruction that sets SPRG0 and the load/store instruction that
> > faults.  If the handler for that interrupt itself does an I/O access,
> > it will overwrite SPRG0, corrupting the value set by the interrupted
> > code.
> 
> Acutally my proposed code restores the value of SPRG0 before returning back to
> the interrupted instruction. So here is the sequence. I think it works.
> 
>  (1) store sprg0 in register Rx (lets say srpg0 had 0xc. Rx now contains 0xc)
> 
>  (2) save faulting instruction address in sprg0 (lets say the value is 0xa.
>   sprg0 will contain 0xa).

Small correction. sprg0 does not store the address of the faulting
instruction. It stores the isntruction itself. Regardless, the code
should work, I think.

RP

Re: Re: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register

2020-07-22 Thread Ram Pai

On Wed, Jul 22, 2020 at 03:02:32PM +1000, Paul Mackerras wrote:
> On Thu, Jul 16, 2020 at 01:32:13AM -0700, Ram Pai wrote:
> > An instruction accessing a mmio address, generates a HDSI fault.  This 
> > fault is
> > appropriately handled by the Hypervisor.  However in the case of secureVMs, 
> > the
> > fault is delivered to the ultravisor.
> > 
> > Unfortunately the Ultravisor has no correct-way to fetch the faulting
> > instruction. The PEF architecture does not allow Ultravisor to enable MMU
> > translation. Walking the two level page table to read the instruction can 
> > race
> > with other vcpus modifying the SVM's process scoped page table.
> > 
> > This problem can be correctly solved with some help from the kernel.
> > 
> > Capture the faulting instruction in SPRG0 register, before executing the
> > faulting instruction. This enables the ultravisor to easily procure the
> > faulting instruction and emulate it.
> 
> Just a comment on the approach of putting the instruction in SPRG0:
> these I/O accessors can be used in interrupt routines, which means
> that if these accessors are ever used with interrupts enabled, there
> is the possibility of an external interrupt occurring between the
> instruction that sets SPRG0 and the load/store instruction that
> faults.  If the handler for that interrupt itself does an I/O access,
> it will overwrite SPRG0, corrupting the value set by the interrupted
> code.

Acutally my proposed code restores the value of SPRG0 before returning back to
the interrupted instruction. So here is the sequence. I think it works.

 (1) store sprg0 in register Rx (lets say srpg0 had 0xc. Rx now contains 0xc)

 (2) save faulting instruction address in sprg0 (lets say the value is 0xa.
sprg0 will contain 0xa).

 (3)<- interrupt arrives

 (4) store sprg0 in register Ry ( Ry now contains 0xa )

 (5) save faulting instruction address in sprg0 (lets say the value is 0xb).

 (6) 0xb: execute faulting instruction

 (7) restore Ry into sprg0 ( sprg0 now contains 0xa )

 (8)<-- return from interrupt

 (9)  0xa: execute faulting instruction

 (10) restore Rx into sprg0 (sprg0 now contains 0xc)

> 
> The choices to fix that would seem to be (a) disable interrupts around
> all I/O accesses, (b) have the accessor save and restore SPRG0, or (c)
> solve the problem another way, such as by doing a H_LOGICAL_CI_LOAD
> or H_LOGICAL_CI_STORE hypercall.

Ok. Will explore (c).

Thanks,
RP

-- 
Ram Pai

Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-21 Thread Ram Pai

On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote:
> When a secure memslot is dropped, all the pages backed in the secure device
> (aka really backed by secure memory by the Ultravisor) should be paged out
> to a normal page. Previously, this was achieved by triggering the page
> fault mechanism which is calling kvmppc_svm_page_out() on each pages.
> 
> This can't work when hot unplugging a memory slot because the memory slot
> is flagged as invalid and gfn_to_pfn() is then not trying to access the
> page, so the page fault mechanism is not triggered.
> 
> Since the final goal is to make a call to kvmppc_svm_page_out() it seems
> simpler to directly calling it instead of triggering such a mechanism. This
^^ call directly instead of triggering..

> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
> memslot.
> 
> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
> the call to __kvmppc_svm_page_out() is made.
> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
> addition, the mmap_sem is help in read mode during that time, not in write
  ^^ held

> mode since the virual memory layout is not impacted, and
> kvm->arch.uvmem_lock prevents concurrent operation on the secure device.
> 
> Cc: Ram Pai 

Reviewed-by: Ram Pai 

RP

Re: [PATCH v2 1/2] KVM: PPC: Book3S HV: move kvmppc_svm_page_out up

2020-07-21 Thread Ram Pai

On Tue, Jul 21, 2020 at 12:42:01PM +0200, Laurent Dufour wrote:
> kvmppc_svm_page_out() will need to be called by kvmppc_uvmem_drop_pages()
> so move it upper in this file.
> 
> Furthermore it will be interesting to call this function when already
> holding the kvm->arch.uvmem_lock, so prefix the original function with __
> and remove the locking in it, and introduce a wrapper which call that
> function with the lock held.
> 
> There is no functional change.

Reviewed-by: Ram Pai 

> 
> Cc: Ram Pai 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Signed-off-by: Laurent Dufour 
> ---

RP

[v4 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-07-17 Thread Ram Pai

From: Laurent Dufour 

When a memory slot is hot plugged to a SVM, PFNs associated with the
GFNs in that slot must be migrated to secure-PFNs, aka device-PFNs.

Call kvmppc_uv_migrate_mem_slot() to accomplish this.
Disable page-merge for all pages in the memory slot.

Signed-off-by: Ram Pai 
[rearranged the code, and modified the commit log]
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h | 10 ++
 arch/powerpc/kvm/book3s_hv.c| 10 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 22 ++
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index f229ab5..6f7da00 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -25,6 +25,9 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot 
*free,
 struct kvm *kvm, bool skip_page_out);
 int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
+void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new);
+void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old);
+
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -84,5 +87,12 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
struct kvm *kvm, bool skip_page_out) { }
+
+static inline void  kvmppc_memslot_create(struct kvm *kvm,
+   const struct kvm_memory_slot *new) { }
+
+static inline void  kvmppc_memslot_delete(struct kvm *kvm,
+   const struct kvm_memory_slot *old) { }
+
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d331b46..bf3be3b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4515,16 +4515,10 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
 
switch (change) {
case KVM_MR_CREATE:
-   if (kvmppc_uvmem_slot_init(kvm, new))
-   return;
-   uv_register_mem_slot(kvm->arch.lpid,
-new->base_gfn << PAGE_SHIFT,
-new->npages * PAGE_SIZE,
-0, new->id);
+   kvmppc_memslot_create(kvm, new);
break;
case KVM_MR_DELETE:
-   uv_unregister_mem_slot(kvm->arch.lpid, old->id);
-   kvmppc_uvmem_slot_free(kvm, old);
+   kvmppc_memslot_delete(kvm, old);
break;
default:
/* TODO: Handle KVM_MR_MOVE */
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index a206984..a2b4d25 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -1089,6 +1089,28 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned 
long gfn)
return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
 }
 
+void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new)
+{
+   if (kvmppc_uvmem_slot_init(kvm, new))
+   return;
+
+   if (kvmppc_memslot_page_merge(kvm, new, false))
+   return;
+
+   if (uv_register_mem_slot(kvm->arch.lpid, new->base_gfn << PAGE_SHIFT,
+   new->npages * PAGE_SIZE, 0, new->id))
+   return;
+
+   kvmppc_uv_migrate_mem_slot(kvm, new);
+}
+
+void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old)
+{
+   uv_unregister_mem_slot(kvm->arch.lpid, old->id);
+   kvmppc_memslot_page_merge(kvm, old, true);
+   kvmppc_uvmem_slot_free(kvm, old);
+}
+
 static u64 kvmppc_get_secmem_size(void)
 {
struct device_node *np;
-- 
1.8.3.1

[v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out

2020-07-17 Thread Ram Pai

The page requested for page-in; sometimes, can have transient
references, and hence cannot migrate immediately. Retry a few times
before returning error.

The same is true for non-migrated pages that are migrated in
H_SVM_INIT_DONE hanlder.  Retry a few times before returning error.

H_SVM_PAGE_IN interface is enhanced to return H_BUSY if the page is
not in a migratable state.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org

Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst |   1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 106 ---
 2 files changed, 74 insertions(+), 33 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index ba6b1bf..fe533ad 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -1035,6 +1035,7 @@ Return values
* H_PARAMETER   if ``guest_pa`` is invalid.
* H_P2  if ``flags`` is invalid.
* H_P3  if ``order`` of page is invalid.
+   * H_BUSYif ``page`` is not in a state to pagein
 
 Description
 ~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 3274663..a206984 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -672,7 +672,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct 
*vma,
return ret;
 
if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
-   ret = -1;
+   ret = -2;
goto out_finalize;
}
 
@@ -700,43 +700,73 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct 
*vma,
return ret;
 }
 
-int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
-   const struct kvm_memory_slot *memslot)
+/*
+ * return 1, if some page migration failed because of transient error,
+ * while the remaining pages migrated successfully.
+ * The caller can use this as a hint to retry.
+ *
+ * return 0 otherwise. *ret indicates the success status
+ * of this call.
+ */
+static bool __kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot, int *ret)
 {
unsigned long gfn = memslot->base_gfn;
struct vm_area_struct *vma;
unsigned long start, end;
-   int ret = 0;
+   bool retry = false;
 
+   *ret = 0;
while (kvmppc_next_nontransitioned_gfn(memslot, kvm, )) {
 
mmap_read_lock(kvm->mm);
start = gfn_to_hva(kvm, gfn);
if (kvm_is_error_hva(start)) {
-   ret = H_STATE;
+   *ret = H_STATE;
goto next;
}
 
end = start + (1UL << PAGE_SHIFT);
vma = find_vma_intersection(kvm->mm, start, end);
if (!vma || vma->vm_start > start || vma->vm_end < end) {
-   ret = H_STATE;
+   *ret = H_STATE;
goto next;
}
 
mutex_lock(>arch.uvmem_lock);
-   ret = kvmppc_svm_migrate_page(vma, start, end,
+   *ret = kvmppc_svm_migrate_page(vma, start, end,
(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
mutex_unlock(>arch.uvmem_lock);
-   if (ret)
-   ret = H_STATE;
 
 next:
mmap_read_unlock(kvm->mm);
+   if (*ret == -2) {
+   retry = true;
+   continue;
+   }
 
-   if (ret)
-   break;
+   if (*ret)
+   return false;
}
+   return retry;
+}
+
+#define REPEAT_COUNT 10
+
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot)
+{
+   int ret = 0, repeat_count = REPEAT_COUNT;
+
+   /*
+* try migration of pages in the memslot 'repeat_count' number of
+* times, provided each time migration fails because of transient
+* errors only.
+*/
+   while (__kvmppc_uv_migrate_mem_slot(kvm, memslot, ) &&
+   repeat_count--)
+   ;
+
return ret;
 }
 
@@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
unsigned long gpa,
struct vm_area_struct *vma;
int srcu_idx;
unsigned long gfn = gpa >> page_shift;
-   int ret;
+   int ret, repeat_count = REPEAT_COUNT;
 
if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
return H_UNSUPPORTED;
@@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct

[v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.

2020-07-17 Thread Ram Pai

The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the
pages of the SVM before calling H_SVM_INIT_DONE. This causes a huge
delay in tranistioning the VM to SVM. The Ultravisor is only interested
in the pages that contain the kernel, initrd and other important data
structures. The rest contain throw-away content.

However if not all pages are requested by the Ultravisor, the Hypervisor
continues to consider the GFNs corresponding to the non-requested pages
as normal GFNs. This can lead to data-corruption and undefined behavior.

In H_SVM_INIT_DONE handler, move all the PFNs associated with the SVM's
GFNs to secure-PFNs. Skip the GFNs that are already Paged-in or Shared
or Paged-in followed by a Paged-out.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst|   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 135 +---
 3 files changed, 125 insertions(+), 14 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index a1c8c37..ba6b1bf 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -934,6 +934,8 @@ Return values
* H_UNSUPPORTED if called from the wrong context (e.g.
from an SVM or before an H_SVM_INIT_START
hypercall).
+   * H_STATE   if the hypervisor could not successfully
+transition the VM to Secure VM.
 
 Description
 ~~~
diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 9cb7d8b..f229ab5 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 struct kvm *kvm, bool skip_page_out);
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index df2e272..3274663 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -93,6 +93,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
@@ -348,8 +349,45 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+/*
+ * starting from *gfn search for the next available GFN that is not yet
+ * transitioned to a secure GFN.  return the value of that GFN in *gfn.  If a
+ * GFN is found, return true, else return false
+ */
+static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot 
*memslot,
+   struct kvm *kvm, unsigned long *gfn)
+{
+   struct kvmppc_uvmem_slot *p;
+   bool ret = false;
+   unsigned long i;
+
+   mutex_lock(>arch.uvmem_lock);
+
+   list_for_each_entry(p, >arch.uvmem_pfns, list)
+   if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns)
+   break;
+   if (!p)
+   goto out;
+   /*
+* The code below assumes, one to one correspondence between
+* kvmppc_uvmem_slot and memslot.
+*/
+   for (i = *gfn; i < p->base_pfn + p->nr_pfns; i++) {
+   unsigned long index = i - p->base_pfn;
+
+   if (!(p->pfns[index] & KVMPPC_GFN_FLAG_MASK)) {
+   *gfn = i;
+   ret = true;
+   break;
+   }
+   }
+out:
+   mutex_unlock(>arch.uvmem_lock);
+   return ret;
+}
+
 static int kvmppc_memslot_page_merge(struct kvm *kvm,
-   struct kvm_memory_slot *memslot, bool merge)
+   const struct kvm_memory_slot *memslot, bool merge)
 {
unsigned long gfn = memslot->base_gfn;
unsigned long end, start = gfn_to_hva(kvm, gfn);
@@ -461,12 +499,31 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 {
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int srcu_idx;
+   long ret = H_SUCCESS;
+
if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
return H_UNSUPPORTED;
 
+   /* migrate any unmoved normal pfn to device p

[v4 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START

2020-07-17 Thread Ram Pai

Page-merging of pages in memory-slots associated with a Secure VM,
is disabled in H_SVM_PAGE_IN handler.

This operation should have been done much earlier; the moment the VM
is initiated for secure-transition. Delaying this operation, increases
the probability for those pages to acquire new references , making it
impossible to migrate those pages.

Disable page-migration in H_SVM_INIT_START handling.

Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst |  1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 98 +++-
 2 files changed, 76 insertions(+), 23 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index df136c8..a1c8c37 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -895,6 +895,7 @@ Return values
 One of the following values:
 
* H_SUCCESS  on success.
+* H_STATEif the VM is not in a position to switch to secure.
 
 Description
 ~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index e6f76bc..0baa293 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+static int kvmppc_memslot_page_merge(struct kvm *kvm,
+   struct kvm_memory_slot *memslot, bool merge)
+{
+   unsigned long gfn = memslot->base_gfn;
+   unsigned long end, start = gfn_to_hva(kvm, gfn);
+   int ret = 0;
+   struct vm_area_struct *vma;
+   int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
+
+   if (kvm_is_error_hva(start))
+   return H_STATE;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+
+   mmap_write_lock(kvm->mm);
+   do {
+   vma = find_vma_intersection(kvm->mm, start, end);
+   if (!vma) {
+   ret = H_STATE;
+   break;
+   }
+   ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+ merge_flag, >vm_flags);
+   if (ret) {
+   ret = H_STATE;
+   break;
+   }
+   start = vma->vm_end + 1;
+   } while (end > vma->vm_end);
+
+   mmap_write_unlock(kvm->mm);
+   return ret;
+}
+
+static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
+{
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int ret = 0;
+
+   slots = kvm_memslots(kvm);
+   kvm_for_each_memslot(memslot, slots) {
+   ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
+   if (ret)
+   break;
+   }
+   return ret;
+}
+
+static inline int kvmppc_disable_page_merge(struct kvm *kvm)
+{
+   return __kvmppc_page_merge(kvm, false);
+}
+
+static inline int kvmppc_enable_page_merge(struct kvm *kvm)
+{
+   return __kvmppc_page_merge(kvm, true);
+}
+
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 {
struct kvm_memslots *slots;
@@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
return H_AUTHORITY;
 
srcu_idx = srcu_read_lock(>srcu);
+
+   /* disable page-merging for all memslot */
+   ret = kvmppc_disable_page_merge(kvm);
+   if (ret)
+   goto out;
+
+   /* register the memslot */
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, slots) {
if (kvmppc_uvmem_slot_init(kvm, memslot)) {
ret = H_PARAMETER;
-   goto out;
+   break;
}
ret = uv_register_mem_slot(kvm->arch.lpid,
   memslot->base_gfn << PAGE_SHIFT,
@@ -245,9 +311,12 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
if (ret < 0) {
kvmppc_uvmem_slot_free(kvm, memslot);
ret = H_PARAMETER;
-   goto out;
+   break;
}
}
+
+   if (ret)
+   kvmppc_enable_page_merge(kvm);
 out:
srcu_read_unlock(>srcu, srcu_idx);
return ret;
@@ -384,7 +453,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  */
 static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, unsigned long gpa, struct kvm *kvm,
-  unsigned long page_shift, bool *downgrade)
+  unsigned long page_shift)
 {
unsigned long src_pfn, dst_pfn = 0;
struct migrate_vma mig;
@@ -400,18 +469,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, 
unsigned long start,
mig.src = _pfn;
mig.dst = _pf

[v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs

2020-07-17 Thread Ram Pai

  |
 | |   |  | |   |   |
 - --
 | |   |  | |   |   |
 | Normal  | Normal| Transient|Error|Error  |Normal |
 | |   |  | |   |   |
 | Secure  |   Error   | Error|Error|Error  |Normal |
 | |   |  | |   |   |
 |Transient|   N/A | Error|Secure   |Normal |Normal |
 



Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 187 +
 1 file changed, 168 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 0baa293..df2e272 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -98,7 +98,127 @@
 static unsigned long *kvmppc_uvmem_bitmap;
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
-#define KVMPPC_UVMEM_PFN   (1UL << 63)
+/*
+ * States of a GFN
+ * ---
+ * The GFN can be in one of the following states.
+ *
+ * (a) Secure - The GFN is secure. The GFN is associated with
+ * a Secure VM, the contents of the GFN is not accessible
+ * to the Hypervisor.  This GFN can be backed by a secure-PFN,
+ * or can be backed by a normal-PFN with contents encrypted.
+ * The former is true when the GFN is paged-in into the
+ * ultravisor. The latter is true when the GFN is paged-out
+ * of the ultravisor.
+ *
+ * (b) Shared - The GFN is shared. The GFN is associated with a
+ * a secure VM. The contents of the GFN is accessible to
+ * Hypervisor. This GFN is backed by a normal-PFN and its
+ * content is un-encrypted.
+ *
+ * (c) Normal - The GFN is a normal. The GFN is associated with
+ * a normal VM. The contents of the GFN is accesible to
+ * the Hypervisor. Its content is never encrypted.
+ *
+ * States of a VM.
+ * ---
+ *
+ * Normal VM:  A VM whose contents are always accessible to
+ * the hypervisor.  All its GFNs are normal-GFNs.
+ *
+ * Secure VM: A VM whose contents are not accessible to the
+ * hypervisor without the VM's consent.  Its GFNs are
+ * either Shared-GFN or Secure-GFNs.
+ *
+ * Transient VM: A Normal VM that is transitioning to secure VM.
+ * The transition starts on successful return of
+ * H_SVM_INIT_START, and ends on successful return
+ * of H_SVM_INIT_DONE. This transient VM, can have GFNs
+ * in any of the three states; i.e Secure-GFN, Shared-GFN,
+ * and Normal-GFN. The VM never executes in this state
+ * in supervisor-mode.
+ *
+ * Memory slot State.
+ * -
+ * The state of a memory slot mirrors the state of the
+ * VM the memory slot is associated with.
+ *
+ * VM State transition.
+ * 
+ *
+ *  A VM always starts in Normal Mode.
+ *
+ *  H_SVM_INIT_START moves the VM into transient state. During this
+ *  time the Ultravisor may request some of its GFNs to be shared or
+ *  secured. So its GFNs can be in one of the three GFN states.
+ *
+ *  H_SVM_INIT_DONE moves the VM entirely from transient state to
+ *  secure-state. At this point any left-over normal-GFNs are
+ *  transitioned to Secure-GFN.
+ *
+ *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
+ *  All its GFNs are moved to Normal-GFNs.
+ *
+ *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
+ *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
+ *  Note: The contents of the normal-GFN is undefined at this point.
+ *
+ * GFN state implementation:
+ * -
+ *
+ * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
+ * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
+ * set, and contains the value of the secure-PFN.
+ * It is associated with a normal-PFN; also called mem_pfn, when
+ * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
+ * The value of the normal-PFN is not tracked.
+ *
+ * Shared GFN is associated with a normal-PFN. Its pfn[] has
+ * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
+ * is not tracked.
+ *
+ * Normal GFN is associated with normal-PFN. Its pfn[] has
+ * no flag set. The value of the normal-PFN is not tracked.
+ *
+ * Life cycle of a GFN
+ * 
+ *
+ * --
+ * || Share  |  Unshare | SVM   |H_SVM_INIT_DONE|
+ * ||operation   |ope

[v4 0/5] Migrate non-migrated pages of a SVM.

2020-07-17 Thread Ram Pai

The time to switch a VM to Secure-VM, increases by the size of the VM.
A 100GB VM takes about 7minutes. This is unacceptable.  This linear
increase is caused by a suboptimal behavior by the Ultravisor and the
Hypervisor.  The Ultravisor unnecessarily migrates all the GFN of the
VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the remaining GFNs as
normal GFNs. This leads to memory corruption; manifested
consistently when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN.  But the hypervisor cannot handle any H_SVM_PAGE_IN
requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random errors
in the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a
SVM, and thus solves the problem.

Testing: Passed rigorous testing using various sized SVMs.

Changelog:

v4:  .  Incorported Bharata's comments:
- Optimization -- replace write mmap semaphore with read mmap semphore.
- disable page-merge during memory hotplug.
- rearranged the patches. consolidated the page-migration-retry logic
in a single patch.

v3: . Optimized the page-migration retry-logic. 
. Relax and relinquish the cpu regularly while bulk migrating
the non-migrated pages. This issue was causing soft-lockups.
Fixed it.
. Added a new patch, to retry page-migration a couple of times
before returning H_BUSY in H_SVM_PAGE_IN. This issue was
seen a few times in a 24hour continuous reboot test of the SVMs.

v2: . fixed a bug observed by Laurent. The state of the GFN's associated
with Secure-VMs were not reset during memslot flush.
. Re-organized the code, for easier review.
. Better description of the patch series.

v1: fixed a bug observed by Bharata. Pages that where paged-in and later
paged-out must also be skipped from migration during H_SVM_INIT_DONE.


Laurent Dufour (1):
  KVM: PPC: Book3S HV: migrate hot plugged memory

Ram Pai (4):
  KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs
to secure-GFNs.
  KVM: PPC: Book3S HV: retry page migration before erroring-out

 Documentation/powerpc/ultravisor.rst|   4 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  12 +
 arch/powerpc/kvm/book3s_hv.c|  10 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 508 
 4 files changed, 457 insertions(+), 77 deletions(-)

-- 
1.8.3.1

[RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register

2020-07-16 Thread Ram Pai

An instruction accessing a mmio address, generates a HDSI fault.  This fault is
appropriately handled by the Hypervisor.  However in the case of secureVMs, the
fault is delivered to the ultravisor.

Unfortunately the Ultravisor has no correct-way to fetch the faulting
instruction. The PEF architecture does not allow Ultravisor to enable MMU
translation. Walking the two level page table to read the instruction can race
with other vcpus modifying the SVM's process scoped page table.

This problem can be correctly solved with some help from the kernel.

Capture the faulting instruction in SPRG0 register, before executing the
faulting instruction. This enables the ultravisor to easily procure the
faulting instruction and emulate it.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/io.h | 85 ++-
 1 file changed, 75 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 635969b..7ef663d 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define SIO_CONFIG_RA  0x398
 #define SIO_CONFIG_RD  0x399
@@ -105,34 +106,98 @@
 static inline u##size name(const volatile u##size __iomem *addr)   \
 {  \
u##size ret;\
-   __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \
-   : "=r" (ret) : "Z" (*addr) : "memory"); \
+   if (is_secure_guest()) {\
+   __asm__ __volatile__("mfsprg0 %3;"  \
+   "lnia %2;"  \
+   "ld %2,12(%2);" \
+   "mtsprg0 %2;"   \
+   "sync;" \
+   #insn" %0,%y1;" \
+   "twi 0,%0,0;"   \
+   "isync;"\
+   "mtsprg0 %3"\
+   : "=r" (ret)\
+   : "Z" (*addr), "r" (0), "r" (0) \
+   : "memory");\
+   } else {\
+   __asm__ __volatile__("sync;"\
+   #insn" %0,%y1;" \
+   "twi 0,%0,0;"   \
+   "isync" \
+   : "=r" (ret) : "Z" (*addr) : "memory"); \
+   }   \
return ret; \
 }
 
 #define DEF_MMIO_OUT_X(name, size, insn)   \
 static inline void name(volatile u##size __iomem *addr, u##size val)   \
 {  \
-   __asm__ __volatile__("sync;"#insn" %1,%y0"  \
-   : "=Z" (*addr) : "r" (val) : "memory"); \
-   mmiowb_set_pending();   \
+   if (is_secure_guest()) {\
+   __asm__ __volatile__("mfsprg0 %3;"  \
+   "lnia %2;"  \
+   "ld %2,12(%2);" \
+   "mtsprg0 %2;"   \
+   "sync;" \
+   #insn" %1,%y0;" \
+   "mtsprg0 %3"\
+   : "=Z" (*addr)  \
+   : "r" (val), "r" (0), "r" (0)   \
+   : "memory");\
+   } else {\
+   __asm__ __volatile__("sync;"\

Re: [v3 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START

2020-07-14 Thread Ram Pai

On Mon, Jul 13, 2020 at 10:59:41AM +0530, Bharata B Rao wrote:
> On Sat, Jul 11, 2020 at 02:13:43AM -0700, Ram Pai wrote:
> > Merging of pages associated with each memslot of a SVM is
> > disabled the page is migrated in H_SVM_PAGE_IN handler.
> > 
> > This operation should have been done much earlier; the moment the VM
> > is initiated for secure-transition. Delaying this operation, increases
> > the probability for those pages to acquire new references , making it
> > impossible to migrate those pages in H_SVM_PAGE_IN handler.
> > 
> > Disable page-migration in H_SVM_INIT_START handling.
> 
> While it is a good idea to disable KSM merging for all VMAs during
> H_SVM_INIT_START, I am curious if you did observe an actual case of
> ksm_madvise() failing which resulted in subsequent H_SVM_PAGE_IN
> failing to migrate?

No. I did not find any ksm_madvise() failing.  But it did not make sense
to ksm_madvise() everytime a page_in was requested. Hence i proposed
this patch. H_SVM_INIT_START is the right place for ksm_advise().

> 
> > 
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/kvm/book3s_hv_uvmem.c | 96 
> > +-
> >  1 file changed, 74 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> > b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > index 3d987b1..bfc3841 100644
> > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > @@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
> > struct kvm *kvm,
> > return false;
> >  }
> >  
> > +static int kvmppc_memslot_page_merge(struct kvm *kvm,
> > +   struct kvm_memory_slot *memslot, bool merge)
> > +{
> > +   unsigned long gfn = memslot->base_gfn;
> > +   unsigned long end, start = gfn_to_hva(kvm, gfn);
> > +   int ret = 0;
> > +   struct vm_area_struct *vma;
> > +   int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> > +
> > +   if (kvm_is_error_hva(start))
> > +   return H_STATE;
> 
> This and other cases below seem to be a new return value from
> H_SVM_INIT_START. May be update the documentation too along with
> this patch?

ok.

> 
> > +
> > +   end = start + (memslot->npages << PAGE_SHIFT);
> > +
> > +   down_write(>mm->mmap_sem);
> 
> When you rebase the patches against latest upstream you may want to
> replace the above and other instances by mmap_write/read_lock().

ok.

> 
> > +   do {
> > +   vma = find_vma_intersection(kvm->mm, start, end);
> > +   if (!vma) {
> > +   ret = H_STATE;
> > +   break;
> > +   }
> > +   ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > + merge_flag, >vm_flags);
> > +   if (ret) {
> > +   ret = H_STATE;
> > +   break;
> > +   }
> > +   start = vma->vm_end + 1;
> > +   } while (end > vma->vm_end);
> > +
> > +   up_write(>mm->mmap_sem);
> > +   return ret;
> > +}
> > +
> > +static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
> > +{
> > +   struct kvm_memslots *slots;
> > +   struct kvm_memory_slot *memslot;
> > +   int ret = 0;
> > +
> > +   slots = kvm_memslots(kvm);
> > +   kvm_for_each_memslot(memslot, slots) {
> > +   ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
> > +   if (ret)
> > +   break;
> > +   }
> > +   return ret;
> > +}
> > +
> > +static inline int kvmppc_disable_page_merge(struct kvm *kvm)
> > +{
> > +   return __kvmppc_page_merge(kvm, false);
> > +}
> > +
> > +static inline int kvmppc_enable_page_merge(struct kvm *kvm)
> > +{
> > +   return __kvmppc_page_merge(kvm, true);
> > +}
> > +
> >  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
> >  {
> > struct kvm_memslots *slots;
> > @@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
> > return H_AUTHORITY;
> >  
> > srcu_idx = srcu_read_lock(>srcu);
> > +
> > +   /* disable page-merging for all memslot */
> > +   ret = kvmppc_disable_page_merge(kvm);
> > +   if (ret)
> > +   goto out;
> > +
> > +   /* register the memslot */
> > slots = kvm_memslots(kvm);
> > kvm_for_each_memslot(memslot, slots) {
> > if (kvmppc_uvmem_slot_init(kvm, mems

Re: [v3 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out H_SVM_PAGE_IN

2020-07-14 Thread Ram Pai

On Mon, Jul 13, 2020 at 03:20:43PM +0530, Bharata B Rao wrote:
> On Sat, Jul 11, 2020 at 02:13:46AM -0700, Ram Pai wrote:
> > The page requested for page-in; sometimes, can have transient
> > references, and hence cannot migrate immediately. Retry a few times
> > before returning error.
> 
> As I noted in the previous patch, we need to understand what are these
> transient errors and they occur on what type of pages?

Its not clear when they occur. But they occur quite irregularly and they 
do occur on pages that were shared in the previous boot.

> 
> The previous patch also introduced a bit of retry logic in the
> page-in path. Can you consolidate the retry logic into a separate
> patch?

yes. having the retry logic in a seperate patch gives us the flexibility
to drop it, if needed.  The retry patch is not the core element of this
patch series.

RP

Re: [v3 3/5] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-07-14 Thread Ram Pai

On Mon, Jul 13, 2020 at 03:15:06PM +0530, Bharata B Rao wrote:
> On Sat, Jul 11, 2020 at 02:13:45AM -0700, Ram Pai wrote:
> > The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the 
> > pages
> >  
> > if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
> > -   ret = -1;
> > +   ret = -2;
> 
> migrate_vma_setup() has marked that this pfn can't be migrated. What
> transient errors are you observing which will disappear within 10
> retries?
> 
> Also till now when UV used to pull in all the pages, we never seemed to
> have hit these transient errors. But now when HV is pushing the same
> pages, we see these errors which are disappearing after 10 retries.
> Can you explain this more please? What sort of pages are these?

We did see them even before this patch. The retry alleviates the
problem, but does not entirely eliminate it. If the chance of seeing
the issue without the patch is 1%,  the chance of seeing this issue
with this patch becomes 0.25%.

> 
> > goto out_finalize;
> > }
> > +   bool retry = 0;
...snip...
> > +
> > +   *ret = 0;
> > +   while (kvmppc_next_nontransitioned_gfn(memslot, kvm, )) {
> > +
> > +   down_write(>mm->mmap_sem);
> 
> Acquiring and releasing mmap_sem in a loop? Any reason?
> 
> Now that you have moved ksm_madvise() calls to init time, any specific
> reason to take write mmap_sem here?

The semaphore protects the vma. right?

And its acquired/released in the loop, to provide the ability to relinquish
the cpu without getting into a tight loop and cause softlockups.

> 
> > +   start = gfn_to_hva(kvm, gfn);
> > +   if (kvm_is_error_hva(start)) {
> > +   *ret = H_STATE;
> > +   goto next;
> > +   }
> > +
> > +   end = start + (1UL << PAGE_SHIFT);
> > +   vma = find_vma_intersection(kvm->mm, start, end);
> > +   if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +   *ret = H_STATE;
> > +   goto next;
> > +   }
> > +
> > +   mutex_lock(>arch.uvmem_lock);
> > +   *ret = kvmppc_svm_migrate_page(vma, start, end,
> > +   (gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> > +   mutex_unlock(>arch.uvmem_lock);
> > +
> > +next:
> > +   up_write(>mm->mmap_sem);
> > +
> > +   if (*ret == -2) {
> > +   retry = 1;
> 
> Using true or false assignment makes it easy to understand the intention
> of this 'retry' variable.

ok. next version will have that change.

Thanks for the comments,
RP

[v3 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out H_SVM_PAGE_IN

2020-07-11 Thread Ram Pai

The page requested for page-in; sometimes, can have transient
references, and hence cannot migrate immediately. Retry a few times
before returning error.

H_SVM_PAGE_IN interface is enhanced to return H_BUSY if the page is
not in a migratable state.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org

Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst |  1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 54 +---
 2 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index d98fc85..638d1a7 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -1034,6 +1034,7 @@ Return values
* H_PARAMETER   if ``guest_pa`` is invalid.
* H_P2  if ``flags`` is invalid.
* H_P3  if ``order`` of page is invalid.
+   * H_BUSYif ``page`` is not in a state to pagein
 
 Description
 ~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 12ed52a..c9bdef6 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -843,7 +843,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
unsigned long gpa,
struct vm_area_struct *vma;
int srcu_idx;
unsigned long gfn = gpa >> page_shift;
-   int ret;
+   int ret, repeat_count = REPEAT_COUNT;
 
if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
return H_UNSUPPORTED;
@@ -857,34 +857,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
unsigned long gpa,
if (flags & H_PAGE_IN_SHARED)
return kvmppc_share_page(kvm, gpa, page_shift);
 
-   ret = H_PARAMETER;
srcu_idx = srcu_read_lock(>srcu);
-   down_write(>mm->mmap_sem);
 
-   start = gfn_to_hva(kvm, gfn);
-   if (kvm_is_error_hva(start))
-   goto out;
-
-   mutex_lock(>arch.uvmem_lock);
/* Fail the page-in request of an already paged-in page */
-   if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
-   goto out_unlock;
+   mutex_lock(>arch.uvmem_lock);
+   ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
+   mutex_unlock(>arch.uvmem_lock);
+   if (ret) {
+   srcu_read_unlock(>srcu, srcu_idx);
+   return H_PARAMETER;
+   }
 
-   end = start + (1UL << page_shift);
-   vma = find_vma_intersection(kvm->mm, start, end);
-   if (!vma || vma->vm_start > start || vma->vm_end < end)
-   goto out_unlock;
+   do {
+   ret = H_PARAMETER;
+   down_write(>mm->mmap_sem);
 
-   if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
-   true))
-   goto out_unlock;
+   start = gfn_to_hva(kvm, gfn);
+   if (kvm_is_error_hva(start)) {
+   up_write(>mm->mmap_sem);
+   break;
+   }
 
-   ret = H_SUCCESS;
+   end = start + (1UL << page_shift);
+   vma = find_vma_intersection(kvm->mm, start, end);
+   if (!vma || vma->vm_start > start || vma->vm_end < end) {
+   up_write(>mm->mmap_sem);
+   break;
+   }
+
+   mutex_lock(>arch.uvmem_lock);
+   ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, 
page_shift, true);
+   mutex_unlock(>arch.uvmem_lock);
+
+   up_write(>mm->mmap_sem);
+   } while (ret == -2 && repeat_count--);
+
+   if (ret == -2)
+   ret = H_BUSY;
 
-out_unlock:
-   mutex_unlock(>arch.uvmem_lock);
-out:
-   up_write(>mm->mmap_sem);
srcu_read_unlock(>srcu, srcu_idx);
return ret;
 }
-- 
1.8.3.1

[v3 0/5] Migrate non-migrated pages of a SVM.

2020-07-11 Thread Ram Pai

The time taken to switch a VM to Secure-VM, increases by the size of the VM.  A
100GB VM takes about 7minutes. This is unacceptable.  This linear increase is
caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to
secure-memory. It has to just migrate the necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the Hypervisor
starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor
will explicitly request to migrate, each and every GFN of the VM. If only
necessary and sufficient GFNs are requested for migration, the Hypervisor
continues to manage the remaining GFNs as normal GFNs. This leads of memory
corruption, manifested consistently when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor
expects the ultravisor to request migration of all GFNs to secure-GFN.  But the
hypervisor cannot handle any H_SVM_PAGE_IN requests from the Ultravisor, done
in the context of UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random
errors in the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a SVM,
 and thus solves the problem.

Testing: Passed rigorous testing using various sized SVMs.

Changelog:

v3: . Optimized the page-migration retry-logic. 
. Relax and relinquish the cpu regularly while bulk migrating
the non-migrated pages. This issue was causing soft-lockups.
Fixed it.
. Added a new patch, to retry page-migration a couple of times
before returning H_BUSY in H_SVM_PAGE_IN. This issue was
seen a few times in a 24hour continuous reboot test of the SVMs.

v2: . fixed a bug observed by Laurent. The state of the GFN's associated
with Secure-VMs were not reset during memslot flush.
. Re-organized the code, for easier review.
. Better description of the patch series.

v1: fixed a bug observed by Bharata. Pages that where paged-in and later
paged-out must also be skipped from migration during H_SVM_INIT_DONE.

Laurent Dufour (1):
  KVM: PPC: Book3S HV: migrate hot plugged memory

Ram Pai (4):
  KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
H_SVM_INIT_DONE
  KVM: PPC: Book3S HV: retry page migration before erroring-out
H_SVM_PAGE_IN

 Documentation/powerpc/ultravisor.rst|   3 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
 arch/powerpc/kvm/book3s_hv.c|  10 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 487 
 4 files changed, 429 insertions(+), 73 deletions(-)

-- 
1.8.3.1

[v3 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-07-11 Thread Ram Pai

From: Laurent Dufour 

When a memory slot is hot plugged to a SVM, PFNs associated with the
GFNs in that slot must be migrated to the secure-PFNs, aka device-PFNs.

kvmppc_uv_migrate_mem_slot() is called to accomplish this. UV_PAGE_IN
ucall is skipped, since the ultravisor does not trust the content of
those pages and hence ignores it.

Signed-off-by: Laurent Dufour 
Signed-off-by: Ram Pai 
[resolved conflicts, and modified the commit log]
---
 arch/powerpc/kvm/book3s_hv.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 819f96d..b0d4231 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4523,10 +4523,12 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
case KVM_MR_CREATE:
if (kvmppc_uvmem_slot_init(kvm, new))
return;
-   uv_register_mem_slot(kvm->arch.lpid,
-new->base_gfn << PAGE_SHIFT,
-new->npages * PAGE_SIZE,
-0, new->id);
+   if (uv_register_mem_slot(kvm->arch.lpid,
+new->base_gfn << PAGE_SHIFT,
+new->npages * PAGE_SIZE,
+0, new->id))
+   return;
+   kvmppc_uv_migrate_mem_slot(kvm, new);
break;
case KVM_MR_DELETE:
uv_unregister_mem_slot(kvm->arch.lpid, old->id);
-- 
1.8.3.1

[v3 3/5] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-07-11 Thread Ram Pai

The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the pages
of the SVM before calling H_SVM_INIT_DONE. This causes a huge delay in
tranistioning the VM to SVM. The Ultravisor is interested in the pages that
contain the kernel, initrd and other important data structures. The rest of the
pages contain throw-away content. Hence requesting just the necessary and
sufficient pages from the Hypervisor is sufficient.

However if not all pages are requested by the Ultravisor, the Hypervisor
continues to consider the GFNs corresponding to the non-requested pages as
normal GFNs. This can lead to data-corruption and undefined behavior.

Move all the PFNs associated with the SVM's GFNs to secure-PFNs, in
H_SVM_INIT_DONE. Skip the GFNs that are already Paged-in or Shared or
Paged-in followed by a Paged-out.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst|   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 166 +---
 3 files changed, 156 insertions(+), 14 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index df136c8..d98fc85 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -933,6 +933,8 @@ Return values
* H_UNSUPPORTED if called from the wrong context (e.g.
from an SVM or before an H_SVM_INIT_START
hypercall).
+   * H_STATE   if the hypervisor could not successfully
+transition the VM to Secure VM.
 
 Description
 ~~~
diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 9cb7d8b..f229ab5 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 struct kvm *kvm, bool skip_page_out);
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 6d6c256..12ed52a 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -93,6 +93,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
@@ -348,6 +349,43 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+/*
+ * starting from *gfn search for the next available GFN that is not yet
+ * transitioned to a secure GFN.  return the value of that GFN in *gfn.  If a
+ * GFN is found, return true, else return false
+ */
+static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot 
*memslot,
+   struct kvm *kvm, unsigned long *gfn)
+{
+   struct kvmppc_uvmem_slot *p;
+   bool ret = false;
+   unsigned long i;
+
+   mutex_lock(>arch.uvmem_lock);
+
+   list_for_each_entry(p, >arch.uvmem_pfns, list)
+   if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns)
+   break;
+   if (!p)
+   goto out;
+   /*
+* The code below assumes, one to one correspondence between
+* kvmppc_uvmem_slot and memslot.
+*/
+   for (i = *gfn; i < p->base_pfn + p->nr_pfns; i++) {
+   unsigned long index = i - p->base_pfn;
+
+   if (!(p->pfns[index] & KVMPPC_GFN_FLAG_MASK)) {
+   *gfn = i;
+   ret = true;
+   break;
+   }
+   }
+out:
+   mutex_unlock(>arch.uvmem_lock);
+   return ret;
+}
+
 static int kvmppc_memslot_page_merge(struct kvm *kvm,
struct kvm_memory_slot *memslot, bool merge)
 {
@@ -461,12 +499,31 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 {
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int srcu_idx;
+   long ret = H_SUCCESS;
+
if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
return H_UNSUPPORTED;
 
+   /* migrate any unmoved normal pfn to device pfns*/
+   srcu_idx = srcu_read_lock(>srcu);
+   slots = k

[v3 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START

2020-07-11 Thread Ram Pai

Merging of pages associated with each memslot of a SVM is
disabled the page is migrated in H_SVM_PAGE_IN handler.

This operation should have been done much earlier; the moment the VM
is initiated for secure-transition. Delaying this operation, increases
the probability for those pages to acquire new references , making it
impossible to migrate those pages in H_SVM_PAGE_IN handler.

Disable page-migration in H_SVM_INIT_START handling.

Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 96 +-
 1 file changed, 74 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 3d987b1..bfc3841 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+static int kvmppc_memslot_page_merge(struct kvm *kvm,
+   struct kvm_memory_slot *memslot, bool merge)
+{
+   unsigned long gfn = memslot->base_gfn;
+   unsigned long end, start = gfn_to_hva(kvm, gfn);
+   int ret = 0;
+   struct vm_area_struct *vma;
+   int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
+
+   if (kvm_is_error_hva(start))
+   return H_STATE;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+
+   down_write(>mm->mmap_sem);
+   do {
+   vma = find_vma_intersection(kvm->mm, start, end);
+   if (!vma) {
+   ret = H_STATE;
+   break;
+   }
+   ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+ merge_flag, >vm_flags);
+   if (ret) {
+   ret = H_STATE;
+   break;
+   }
+   start = vma->vm_end + 1;
+   } while (end > vma->vm_end);
+
+   up_write(>mm->mmap_sem);
+   return ret;
+}
+
+static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
+{
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int ret = 0;
+
+   slots = kvm_memslots(kvm);
+   kvm_for_each_memslot(memslot, slots) {
+   ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
+   if (ret)
+   break;
+   }
+   return ret;
+}
+
+static inline int kvmppc_disable_page_merge(struct kvm *kvm)
+{
+   return __kvmppc_page_merge(kvm, false);
+}
+
+static inline int kvmppc_enable_page_merge(struct kvm *kvm)
+{
+   return __kvmppc_page_merge(kvm, true);
+}
+
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 {
struct kvm_memslots *slots;
@@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
return H_AUTHORITY;
 
srcu_idx = srcu_read_lock(>srcu);
+
+   /* disable page-merging for all memslot */
+   ret = kvmppc_disable_page_merge(kvm);
+   if (ret)
+   goto out;
+
+   /* register the memslot */
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, slots) {
if (kvmppc_uvmem_slot_init(kvm, memslot)) {
ret = H_PARAMETER;
-   goto out;
+   break;
}
ret = uv_register_mem_slot(kvm->arch.lpid,
   memslot->base_gfn << PAGE_SHIFT,
@@ -245,9 +311,12 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
if (ret < 0) {
kvmppc_uvmem_slot_free(kvm, memslot);
ret = H_PARAMETER;
-   goto out;
+   break;
}
}
+
+   if (ret)
+   kvmppc_enable_page_merge(kvm);
 out:
srcu_read_unlock(>srcu, srcu_idx);
return ret;
@@ -384,7 +453,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  */
 static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, unsigned long gpa, struct kvm *kvm,
-  unsigned long page_shift, bool *downgrade)
+  unsigned long page_shift)
 {
unsigned long src_pfn, dst_pfn = 0;
struct migrate_vma mig;
@@ -400,18 +469,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, 
unsigned long start,
mig.src = _pfn;
mig.dst = _pfn;
 
-   /*
-* We come here with mmap_sem write lock held just for
-* ksm_madvise(), otherwise we only need read mmap_sem.
-* Hence downgrade to read lock once ksm_madvise() is done.
-*/
-   ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- MADV_UNMERGEABLE, >vm_flags);
-   downgrade_write(>mm->mmap_sem);
-   *downgrade = tru

[v3 2/5] KVM: PPC: Book3S HV: track the state of GFNs associated with secure VMs

2020-07-11 Thread Ram Pai

  |
 | |   |  | |   |   |
 - --
 | |   |  | |   |   |
 | Normal  | Normal| Transient|Error|Error  |Normal |
 | |   |  | |   |   |
 | Secure  |   Error   | Error|Error|Error  |Normal |
 | |   |  | |   |   |
 |Transient|   N/A | Error|Secure   |Normal |Normal |
 



Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 187 +
 1 file changed, 168 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index bfc3841..6d6c256 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -98,7 +98,127 @@
 static unsigned long *kvmppc_uvmem_bitmap;
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
-#define KVMPPC_UVMEM_PFN   (1UL << 63)
+/*
+ * States of a GFN
+ * ---
+ * The GFN can be in one of the following states.
+ *
+ * (a) Secure - The GFN is secure. The GFN is associated with
+ * a Secure VM, the contents of the GFN is not accessible
+ * to the Hypervisor.  This GFN can be backed by a secure-PFN,
+ * or can be backed by a normal-PFN with contents encrypted.
+ * The former is true when the GFN is paged-in into the
+ * ultravisor. The latter is true when the GFN is paged-out
+ * of the ultravisor.
+ *
+ * (b) Shared - The GFN is shared. The GFN is associated with a
+ * a secure VM. The contents of the GFN is accessible to
+ * Hypervisor. This GFN is backed by a normal-PFN and its
+ * content is un-encrypted.
+ *
+ * (c) Normal - The GFN is a normal. The GFN is associated with
+ * a normal VM. The contents of the GFN is accesible to
+ * the Hypervisor. Its content is never encrypted.
+ *
+ * States of a VM.
+ * ---
+ *
+ * Normal VM:  A VM whose contents are always accessible to
+ * the hypervisor.  All its GFNs are normal-GFNs.
+ *
+ * Secure VM: A VM whose contents are not accessible to the
+ * hypervisor without the VM's consent.  Its GFNs are
+ * either Shared-GFN or Secure-GFNs.
+ *
+ * Transient VM: A Normal VM that is transitioning to secure VM.
+ * The transition starts on successful return of
+ * H_SVM_INIT_START, and ends on successful return
+ * of H_SVM_INIT_DONE. This transient VM, can have GFNs
+ * in any of the three states; i.e Secure-GFN, Shared-GFN,
+ * and Normal-GFN. The VM never executes in this state
+ * in supervisor-mode.
+ *
+ * Memory slot State.
+ * -
+ * The state of a memory slot mirrors the state of the
+ * VM the memory slot is associated with.
+ *
+ * VM State transition.
+ * 
+ *
+ *  A VM always starts in Normal Mode.
+ *
+ *  H_SVM_INIT_START moves the VM into transient state. During this
+ *  time the Ultravisor may request some of its GFNs to be shared or
+ *  secured. So its GFNs can be in one of the three GFN states.
+ *
+ *  H_SVM_INIT_DONE moves the VM entirely from transient state to
+ *  secure-state. At this point any left-over normal-GFNs are
+ *  transitioned to Secure-GFN.
+ *
+ *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
+ *  All its GFNs are moved to Normal-GFNs.
+ *
+ *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
+ *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
+ *  Note: The contents of the normal-GFN is undefined at this point.
+ *
+ * GFN state implementation:
+ * -
+ *
+ * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
+ * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
+ * set, and contains the value of the secure-PFN.
+ * It is associated with a normal-PFN; also called mem_pfn, when
+ * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
+ * The value of the normal-PFN is not tracked.
+ *
+ * Shared GFN is associated with a normal-PFN. Its pfn[] has
+ * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
+ * is not tracked.
+ *
+ * Normal GFN is associated with normal-PFN. Its pfn[] has
+ * no flag set. The value of the normal-PFN is not tracked.
+ *
+ * Life cycle of a GFN
+ * 
+ *
+ * --
+ * || Share  |  Unshare | SVM   |H_SVM_INIT_DONE|
+ * ||operation   |ope

Re: [PATCH v3 0/4] Migrate non-migrated pages of a SVM.

2020-06-29 Thread Ram Pai

On Mon, Jun 29, 2020 at 07:23:30AM +0530, Bharata B Rao wrote:
> On Sun, Jun 28, 2020 at 09:41:53PM +0530, Bharata B Rao wrote:
> > On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote:
> > > The time taken to switch a VM to Secure-VM, increases by the size of the 
> > > VM.  A
> > > 100GB VM takes about 7minutes. This is unacceptable.  This linear 
> > > increase is
> > > caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
> > > Ultravisor unnecessarily migrates all the GFN of the VM from 
> > > normal-memory to
> > > secure-memory. It has to just migrate the necessary and sufficient GFNs.
> > > 
> > > However when the optimization is incorporated in the Ultravisor, the 
> > > Hypervisor
> > > starts misbehaving. The Hypervisor has a inbuilt assumption that the 
> > > Ultravisor
> > > will explicitly request to migrate, each and every GFN of the VM. If only
> > > necessary and sufficient GFNs are requested for migration, the Hypervisor
> > > continues to manage the remaining GFNs as normal GFNs. This leads of 
> > > memory
> > > corruption, manifested consistently when the SVM reboots.
> > > 
> > > The same is true, when a memory slot is hotplugged into a SVM. The 
> > > Hypervisor
> > > expects the ultravisor to request migration of all GFNs to secure-GFN.  
> > > But at
> > > the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN 
> > > requests
> > > from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall.  
> > > This
> > > problem manifests as random errors in the SVM, when a memory-slot is
> > > hotplugged.
> > > 
> > > This patch series automatically migrates the non-migrated pages of a SVM,
> > >  and thus solves the problem.
> > 
> > So this is what I understand as the objective of this patchset:
> > 
> > 1. Getting all the pages into the secure memory right when the guest
> >transitions into secure mode is expensive. Ultravisor wants to just get
> >the necessary and sufficient pages in and put the onus on the Hypervisor
> >to mark the remaining pages (w/o actual page-in) as secure during
> >H_SVM_INIT_DONE.
> > 2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that
> >are already secure from the pages that are shared and that are paged-out.
> >For this you are introducing all these new states in HV.
> > 
> > UV knows about the shared GFNs and maintains the state of the same. Hence
> > let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN
> > and if UV finds any shared pages in them, let it fail the uv-page-in call.
> > Then HV can fail the migration for it  and the page continues to remain
> > shared. With this, you don't need to maintain a state for secured GFN in HV.
> > 
> > In the unlikely case of sending a paged-out page to UV during
> > H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again
> > if required. With this, you don't need a state in HV to identify a
> > paged-out-but-encrypted state.
> > 
> > Doesn't the above work?
> 
> I see that you want to infact skip the uv-page-in calls from H_SVM_INIT_DONE.
> So that would need the extra states in HV which you are proposing here.

Yes. I want to skip to speed up the overall ESM switch.

RP

[PATCH v3 4/4] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-06-19 Thread Ram Pai

From: Laurent Dufour 

When a memory slot is hot plugged to a SVM, PFNs associated with the
GFNs in that slot must be migrated to the secure-PFNs, aka device-PFNs.

kvmppc_uv_migrate_mem_slot() is called to accomplish this. UV_PAGE_IN
ucall is skipped, since the ultravisor does not trust the content of
those pages and hence ignores it.

Signed-off-by: Laurent Dufour 
Signed-off-by: Ram Pai 
[resolved conflicts, and modified the commit log]
---
 arch/powerpc/kvm/book3s_hv.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6717d24..fcea41c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4531,10 +4531,12 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
case KVM_MR_CREATE:
if (kvmppc_uvmem_slot_init(kvm, new))
return;
-   uv_register_mem_slot(kvm->arch.lpid,
-new->base_gfn << PAGE_SHIFT,
-new->npages * PAGE_SIZE,
-0, new->id);
+   if (uv_register_mem_slot(kvm->arch.lpid,
+new->base_gfn << PAGE_SHIFT,
+new->npages * PAGE_SIZE,
+0, new->id))
+   return;
+   kvmppc_uv_migrate_mem_slot(kvm, new);
break;
case KVM_MR_DELETE:
uv_unregister_mem_slot(kvm->arch.lpid, old->id);
-- 
1.8.3.1

[PATCH v3 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-06-19 Thread Ram Pai

H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
called H_SVM_PAGE_IN for all secure pages. These GFNs continue to be
normal GFNs associated with normal PFNs; when infact, these GFNs should
have been secure GFNs, associated with device PFNs.

Move all the PFNs associated with the SVM's GFNs, to secure-PFNs, in
H_SVM_INIT_DONE. Skip the GFNs that are already Paged-in or Shared
through H_SVM_PAGE_IN, or Paged-in followed by a Paged-out through
UV_PAGE_OUT.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst|   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 154 +++-
 3 files changed, 132 insertions(+), 26 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index 363736d..3bc8957 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -933,6 +933,8 @@ Return values
* H_UNSUPPORTED if called from the wrong context (e.g.
from an SVM or before an H_SVM_INIT_START
hypercall).
+   * H_STATE   if the hypervisor could not successfully
+transition the VM to Secure VM.
 
 Description
 ~~~
diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 5a9834e..b9cd7eb 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -22,6 +22,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 struct kvm *kvm, bool skip_page_out);
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index c8c0290..449e8a7 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -93,6 +93,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
@@ -339,6 +340,21 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+/* return true, if the GFN is a shared-GFN, or a secure-GFN */
+bool kvmppc_gfn_has_transitioned(unsigned long gfn, struct kvm *kvm)
+{
+   struct kvmppc_uvmem_slot *p;
+
+   list_for_each_entry(p, >arch.uvmem_pfns, list) {
+   if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
+   unsigned long index = gfn - p->base_pfn;
+
+   return (p->pfns[index] & KVMPPC_GFN_FLAG_MASK);
+   }
+   }
+   return false;
+}
+
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 {
struct kvm_memslots *slots;
@@ -379,12 +395,31 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 {
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int srcu_idx;
+   long ret = H_SUCCESS;
+
if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
return H_UNSUPPORTED;
 
+   /* migrate any unmoved normal pfn to device pfns*/
+   srcu_idx = srcu_read_lock(>srcu);
+   slots = kvm_memslots(kvm);
+   kvm_for_each_memslot(memslot, slots) {
+   ret = kvmppc_uv_migrate_mem_slot(kvm, memslot);
+   if (ret) {
+   ret = H_STATE;
+   goto out;
+   }
+   }
+
kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
pr_info("LPID %d went secure\n", kvm->arch.lpid);
-   return H_SUCCESS;
+
+out:
+   srcu_read_unlock(>srcu, srcu_idx);
+   return ret;
 }
 
 /*
@@ -505,12 +540,14 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
 }
 
 /*
- * Alloc a PFN from private device memory pool and copy page from normal
- * memory to secure memory using UV_PAGE_IN uvcall.
+ * Alloc a PFN from private device memory pool. If @pagein is true,
+ * copy page from normal memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
-  unsigned long end, unsigned long gpa, struct kvm *kvm,
-  unsigned long page_shift, bool *dow

[PATCH v3 2/4] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs

2020-06-19 Thread Ram Pai

  |
 | |   |  | |   |   |
 - --
 | |   |  | |   |   |
 | Normal  | Normal| Transient|Error|Error  |Normal |
 | |   |  | |   |   |
 | Secure  |   Error   | Error|Error|Error  |Normal |
 | |   |  | |   |   |
 |Transient|   N/A | Error|Secure   |Normal |Normal |
 



Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 187 +
 1 file changed, 168 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 3599aaa..c8c0290 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -98,7 +98,127 @@
 static unsigned long *kvmppc_uvmem_bitmap;
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
-#define KVMPPC_UVMEM_PFN   (1UL << 63)
+/*
+ * States of a GFN
+ * ---
+ * The GFN can be in one of the following states.
+ *
+ * (a) Secure - The GFN is secure. The GFN is associated with
+ * a Secure VM, the contents of the GFN is not accessible
+ * to the Hypervisor.  This GFN can be backed by a secure-PFN,
+ * or can be backed by a normal-PFN with contents encrypted.
+ * The former is true when the GFN is paged-in into the
+ * ultravisor. The latter is true when the GFN is paged-out
+ * of the ultravisor.
+ *
+ * (b) Shared - The GFN is shared. The GFN is associated with a
+ * a secure VM. The contents of the GFN is accessible to
+ * Hypervisor. This GFN is backed by a normal-PFN and its
+ * content is un-encrypted.
+ *
+ * (c) Normal - The GFN is a normal. The GFN is associated with
+ * a normal VM. The contents of the GFN is accesible to
+ * the Hypervisor. Its content is never encrypted.
+ *
+ * States of a VM.
+ * ---
+ *
+ * Normal VM:  A VM whose contents are always accessible to
+ * the hypervisor.  All its GFNs are normal-GFNs.
+ *
+ * Secure VM: A VM whose contents are not accessible to the
+ * hypervisor without the VM's consent.  Its GFNs are
+ * either Shared-GFN or Secure-GFNs.
+ *
+ * Transient VM: A Normal VM that is transitioning to secure VM.
+ * The transition starts on successful return of
+ * H_SVM_INIT_START, and ends on successful return
+ * of H_SVM_INIT_DONE. This transient VM, can have GFNs
+ * in any of the three states; i.e Secure-GFN, Shared-GFN,
+ * and Normal-GFN. The VM never executes in this state
+ * in supervisor-mode.
+ *
+ * Memory slot State.
+ * -
+ * The state of a memory slot mirrors the state of the
+ * VM the memory slot is associated with.
+ *
+ * VM State transition.
+ * 
+ *
+ *  A VM always starts in Normal Mode.
+ *
+ *  H_SVM_INIT_START moves the VM into transient state. During this
+ *  time the Ultravisor may request some of its GFNs to be shared or
+ *  secured. So its GFNs can be in one of the three GFN states.
+ *
+ *  H_SVM_INIT_DONE moves the VM entirely from transient state to
+ *  secure-state. At this point any left-over normal-GFNs are
+ *  transitioned to Secure-GFN.
+ *
+ *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
+ *  All its GFNs are moved to Normal-GFNs.
+ *
+ *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
+ *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
+ *  Note: The contents of the normal-GFN is undefined at this point.
+ *
+ * GFN state implementation:
+ * -
+ *
+ * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
+ * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
+ * set, and contains the value of the secure-PFN.
+ * It is associated with a normal-PFN; also called mem_pfn, when
+ * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
+ * The value of the normal-PFN is not tracked.
+ *
+ * Shared GFN is associated with a normal-PFN. Its pfn[] has
+ * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
+ * is not tracked.
+ *
+ * Normal GFN is associated with normal-PFN. Its pfn[] has
+ * no flag set. The value of the normal-PFN is not tracked.
+ *
+ * Life cycle of a GFN
+ * 
+ *
+ * --
+ * || Share  |  Unshare | SVM   |H_SVM_INIT_DONE|
+ * ||operation   |ope

[PATCH v3 1/4] KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c

2020-06-19 Thread Ram Pai

Without this fix, git is confused. It generates wrong
function context for code changes in subsequent patches.
Weird, but true.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index ad950f89..3599aaa 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -369,8 +369,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Alloc a PFN from private device memory pool and copy page from normal
  * memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int
-kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
+static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, unsigned long gpa, struct kvm *kvm,
   unsigned long page_shift, bool *downgrade)
 {
@@ -437,8 +436,8 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
  * to unmap the device page from QEMU's page tables.
  */
-static unsigned long
-kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
+static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
+   unsigned long page_shift)
 {
 
int ret = H_PARAMETER;
@@ -487,9 +486,9 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * H_PAGE_IN_SHARED flag makes the page shared which means that the same
  * memory in is visible from both UV and HV.
  */
-unsigned long
-kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
-unsigned long flags, unsigned long page_shift)
+unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
+   unsigned long flags,
+   unsigned long page_shift)
 {
bool downgrade = false;
unsigned long start, end;
@@ -546,10 +545,10 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Provision a new page on HV side and copy over the contents
  * from secure memory using UV_PAGE_OUT uvcall.
  */
-static int
-kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
-   unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
+static int kvmppc_svm_page_out(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long page_shift,
+   struct kvm *kvm, unsigned long gpa)
 {
unsigned long src_pfn, dst_pfn = 0;
struct migrate_vma mig;
-- 
1.8.3.1

[PATCH v3 0/4] Migrate non-migrated pages of a SVM.

2020-06-19 Thread Ram Pai

The time taken to switch a VM to Secure-VM, increases by the size of the VM.  A
100GB VM takes about 7minutes. This is unacceptable.  This linear increase is
caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to
secure-memory. It has to just migrate the necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the Hypervisor
starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor
will explicitly request to migrate, each and every GFN of the VM. If only
necessary and sufficient GFNs are requested for migration, the Hypervisor
continues to manage the remaining GFNs as normal GFNs. This leads of memory
corruption, manifested consistently when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor
expects the ultravisor to request migration of all GFNs to secure-GFN.  But at
the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests
from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall.  This
problem manifests as random errors in the SVM, when a memory-slot is
hotplugged.

This patch series automatically migrates the non-migrated pages of a SVM,
 and thus solves the problem.

Testing: Passed rigorous SVM test using various sized SVMs.

Changelog:

v2: . fixed a bug observed by Laurent. The state of the GFN's associated
with Secure-VMs were not reset during memslot flush.
. Re-organized the code, for easier review.
. Better description of the patch series.

v1: fixed a bug observed by Bharata. Pages that where paged-in and later
paged-out must also be skipped from migration during H_SVM_INIT_DONE.

Laurent Dufour (1):
  KVM: PPC: Book3S HV: migrate hot plugged memory

Ram Pai (3):
  KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
  KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
H_SVM_INIT_DONE

 Documentation/powerpc/ultravisor.rst|   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
 arch/powerpc/kvm/book3s_hv.c|  10 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 360 +++-
 4 files changed, 315 insertions(+), 59 deletions(-)

-- 
1.8.3.1

Re: [PATCH v2 2/4] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs

2020-06-18 Thread Ram Pai

On Thu, Jun 18, 2020 at 03:31:06PM +0200, Laurent Dufour wrote:
> Le 18/06/2020 à 11:19, Ram Pai a écrit :
> >

.snip..

> >
> >  1. States of a GFN
> > ---
> >  The GFN can be in one of the following states.
> >diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
> >b/arch/powerpc/kvm/book3s_64_mmu_radix.c

...snip...

> >index 803940d..3448459 100644
> >--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >@@ -1100,7 +1100,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
> > unsigned int shift;
> > if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
> >-kvmppc_uvmem_drop_pages(memslot, kvm, true);
> >+kvmppc_uvmem_drop_pages(memslot, kvm, true, false);
> 
> When reviewing the v1 of this series, I asked you the question about
> the fact that the call here is made with purge_gfn = false. Your
> answer was:
> 
> >This function does not know, under what context it is called. Since
> >its job is to just flush the memslot, it cannot assume anything
> >about purging the pages in the memslot.
> 
> Indeed in the case of the memory hotplug operation, this function is
> called to wipe the page from the secure device in the case the pages
> are secured. In that case the purge is required. Indeed, I checked
> the other call to kvmppc_radix_flush_memslot() in
> kvmppc_core_flush_memslot_hv() and I cannot see why in that case too
> purge_gfn should be false, especially when the memslot is reused as
> detailed in __kvm_set_memory_region() around the call to
> kvm_arch_flush_shadow_memslot().
> 
> I'm sorry to not have ask this earlier, but could you please elaborate on 
> this?

You are right. kvmppc_radix_flush_memslot() is getting called everytime with
the intention of disassociating the memslot from that VM. Which implies,
the memslot is intended to be deleted and possibly reused.

I should be calling kvmppc_uvmem_drop_pages() with purge_gfn=true, here
aswell.

I expect some form of problem showing up in memhot-plug/unplug path.

RP

Re: [PATCH v2 0/4] Migrate non-migrated pages of a SVM.

2020-06-18 Thread Ram Pai

I should have elaborated on the problem and the need for these patches.

Explaining it here. Will add it to the series in next version.

-

The time taken to switch a VM to Secure-VM, increases by the size of
the VM.  A 100GB VM takes about 7minutes. This is unacceptable.  This
linear increase is caused by a suboptimal behavior by the Ultravisor and
the Hypervisor.  The Ultravisor unnecessarily migrates all the GFN of
the VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the rest of the GFNs are
normal GFNs. This leads of memory corruption, manifested consistently
when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN.  But at the same time the hypervisor is unable to handle any
H_SVM_PAGE_IN requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random errors in
the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a SVM,
and thus solves the problem.

--

On Thu, Jun 18, 2020 at 02:19:01AM -0700, Ram Pai wrote:
> This patch series migrates the non-migrated pages of a SVM.
> This is required when the UV calls H_SVM_INIT_DONE, and
> when a memory-slot is hotplugged to a Secure VM.
> 
> Testing: Passed rigorous SVM reboot test using different
>   sized SVMs.
> 
> Changelog:
>   . fixed a bug observed by Bharata. Pages that
>   where paged-in and later paged-out must also be
>   skipped from migration during H_SVM_INIT_DONE.
> 
> Laurent Dufour (1):
>   KVM: PPC: Book3S HV: migrate hot plugged memory
> 
> Ram Pai (3):
>   KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
>   KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
>   KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
> H_SVM_INIT_DONE
> 
>  Documentation/powerpc/ultravisor.rst|   2 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |   8 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c  |   2 +-
>  arch/powerpc/kvm/book3s_hv.c|  12 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c  | 449 
> ++--
>  5 files changed, 368 insertions(+), 105 deletions(-)
> 
> -- 
> 1.8.3.1

-- 
Ram Pai

[PATCH v2 4/4] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-06-18 Thread Ram Pai

From: Laurent Dufour 

When a memory slot is hot plugged to a SVM, PFNs associated with the
GFNs in that slot must be migrated to the secure-PFNs, aka device-PFNs.

kvmppc_uv_migrate_mem_slot() is called to accomplish this. UV_PAGE_IN
ucall is skipped, since the ultravisor does not trust the content of
those pages and hence ignores it.

Signed-off-by: Ram Pai 
[resolved conflicts, and modified the commit log]
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  2 ++
 arch/powerpc/kvm/book3s_hv.c| 10 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c  |  2 +-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index f0c5708..05ae789 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 struct kvm *kvm, bool skip_page_out,
 bool purge_gfn);
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6cf80e5..bf7324d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4531,10 +4531,12 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
case KVM_MR_CREATE:
if (kvmppc_uvmem_slot_init(kvm, new))
return;
-   uv_register_mem_slot(kvm->arch.lpid,
-new->base_gfn << PAGE_SHIFT,
-new->npages * PAGE_SIZE,
-0, new->id);
+   if (uv_register_mem_slot(kvm->arch.lpid,
+new->base_gfn << PAGE_SHIFT,
+new->npages * PAGE_SIZE,
+0, new->id))
+   return;
+   kvmppc_uv_migrate_mem_slot(kvm, new);
break;
case KVM_MR_DELETE:
uv_unregister_mem_slot(kvm->arch.lpid, old->id);
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 78f8580..4d8f5bc 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -451,7 +451,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct 
*vma,
return ret;
 }
 
-static int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
const struct kvm_memory_slot *memslot)
 {
unsigned long gfn = memslot->base_gfn;
-- 
1.8.3.1

[PATCH v2 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-06-18 Thread Ram Pai

H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
called H_SVM_PAGE_IN for all secure pages. These GFNs continue to be
normal GFNs associated with normal PFNs; when infact, these GFNs should
have been secure GFNs, associated with device PFNs.

Move all the PFN associated with the SVM's GFNs, to secure-PFNs, in
H_SVM_INIT_DONE. Skip the GFNs that are already Paged-in or Shared
through H_SVM_PAGE_IN, or Paged-in followed by a Paged-out through
UV_PAGE_OUT.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 235 +--
 2 files changed, 171 insertions(+), 66 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index 363736d..3bc8957 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -933,6 +933,8 @@ Return values
* H_UNSUPPORTED if called from the wrong context (e.g.
from an SVM or before an H_SVM_INIT_START
hypercall).
+   * H_STATE   if the hypervisor could not successfully
+transition the VM to Secure VM.
 
 Description
 ~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 666d1bb..78f8580 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -339,6 +339,21 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
struct kvm *kvm,
return false;
 }
 
+/* return true, if the GFN is a shared-GFN, or a secure-GFN */
+bool kvmppc_gfn_has_transitioned(unsigned long gfn, struct kvm *kvm)
+{
+   struct kvmppc_uvmem_slot *p;
+
+   list_for_each_entry(p, >arch.uvmem_pfns, list) {
+   if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
+   unsigned long index = gfn - p->base_pfn;
+
+   return (p->pfns[index] & KVMPPC_GFN_FLAG_MASK);
+   }
+   }
+   return false;
+}
+
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 {
struct kvm_memslots *slots;
@@ -377,14 +392,152 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
return ret;
 }
 
+static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm);
+
+/*
+ * Alloc a PFN from private device memory pool. If @pagein is true,
+ * copy page from normal memory to secure memory using UV_PAGE_IN uvcall.
+ */
+static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long gpa, struct kvm *kvm,
+   unsigned long page_shift,
+   bool pagein)
+{
+   unsigned long src_pfn, dst_pfn = 0;
+   struct migrate_vma mig;
+   struct page *dpage;
+   struct page *spage;
+   unsigned long pfn;
+   int ret = 0;
+
+   memset(, 0, sizeof(mig));
+   mig.vma = vma;
+   mig.start = start;
+   mig.end = end;
+   mig.src = _pfn;
+   mig.dst = _pfn;
+
+   ret = migrate_vma_setup();
+   if (ret)
+   return ret;
+
+   if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
+   ret = -1;
+   goto out_finalize;
+   }
+
+   dpage = kvmppc_uvmem_get_page(gpa, kvm);
+   if (!dpage) {
+   ret = -1;
+   goto out_finalize;
+   }
+
+   if (pagein) {
+   pfn = *mig.src >> MIGRATE_PFN_SHIFT;
+   spage = migrate_pfn_to_page(*mig.src);
+   if (spage) {
+   ret = uv_page_in(kvm->arch.lpid, pfn << page_shift,
+   gpa, 0, page_shift);
+   if (ret)
+   goto out_finalize;
+   }
+   }
+
+   *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   migrate_vma_pages();
+out_finalize:
+   migrate_vma_finalize();
+   return ret;
+}
+
+static int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot)
+{
+   unsigned long gfn = memslot->base_gfn;
+   unsigned long end;
+   bool downgrade = false;
+   struct vm_area_struct *vma;
+   int i, ret = 0;
+   unsigned long start = gfn_to_hva(kvm, gfn);
+
+   if (kvm_is_error_hva(start))
+   return H_STATE;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+
+   down_write(>mm->mmap_sem);
+
+   mutex_lock(>arch.uvmem_lock);
+   vma = find_vma_intersection(kvm->

[PATCH v2 2/4] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs

2020-06-18 Thread Ram Pai

  |
 | |   |  | |   |   |
 - --
 | |   |  | |   |   |
 | Normal  | Normal| Transient|Error|Error  |Normal |
 | |   |  | |   |   |
 | Secure  |   Error   | Error|Error|Error  |Normal |
 | |   |  | |   |   |
 |Transient|   N/A | Error|Secure   |Normal |Normal |
 



Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c  |   2 +-
 arch/powerpc/kvm/book3s_hv.c|   2 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 195 +---
 4 files changed, 180 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 5a9834e..f0c5708 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -21,7 +21,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-struct kvm *kvm, bool skip_page_out);
+struct kvm *kvm, bool skip_page_out,
+bool purge_gfn);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -75,6 +76,7 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
 
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-   struct kvm *kvm, bool skip_page_out) { }
+   struct kvm *kvm, bool skip_page_out,
+   bool purge_gfn) { }
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 803940d..3448459 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1100,7 +1100,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
unsigned int shift;
 
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
-   kvmppc_uvmem_drop_pages(memslot, kvm, true);
+   kvmppc_uvmem_drop_pages(memslot, kvm, true, false);
 
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
return;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6717d24..6cf80e5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5482,7 +5482,7 @@ static int kvmhv_svm_off(struct kvm *kvm)
continue;
 
kvm_for_each_memslot(memslot, slots) {
-   kvmppc_uvmem_drop_pages(memslot, kvm, true);
+   kvmppc_uvmem_drop_pages(memslot, kvm, true, true);
uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
}
}
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 3599aaa..666d1bb 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -98,7 +98,127 @@
 static unsigned long *kvmppc_uvmem_bitmap;
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
-#define KVMPPC_UVMEM_PFN   (1UL << 63)
+/*
+ * States of a GFN
+ * ---
+ * The GFN can be in one of the following states.
+ *
+ * (a) Secure - The GFN is secure. The GFN is associated with
+ * a Secure VM, the contents of the GFN is not accessible
+ * to the Hypervisor.  This GFN can be backed by a secure-PFN,
+ * or can be backed by a normal-PFN with contents encrypted.
+ * The former is true when the GFN is paged-in into the
+ * ultravisor. The latter is true when the GFN is paged-out
+ * of the ultravisor.
+ *
+ * (b) Shared - The GFN is shared. The GFN is associated with a
+ * a secure VM. The contents of the GFN is accessible to
+ * Hypervisor. This GFN is backed by a normal-PFN and its
+ * content is un-encrypted.
+ *
+ * (c) Normal - The GFN is a normal. The GFN is associated with
+ * a normal VM. The contents of the GFN is accesible to
+ * the Hypervisor. Its content is never encrypted.
+ *
+ * States of a VM.
+ * ---
+ *
+ * Normal VM:  A VM whose contents are always accessible to
+ * the hypervisor

[PATCH v2 1/4] KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c

2020-06-18 Thread Ram Pai

Without this fix, git is confused. It generates wrong
function context for code changes in subsequent patches.
Weird, but true.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index ad950f89..3599aaa 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -369,8 +369,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Alloc a PFN from private device memory pool and copy page from normal
  * memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int
-kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
+static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, unsigned long gpa, struct kvm *kvm,
   unsigned long page_shift, bool *downgrade)
 {
@@ -437,8 +436,8 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
  * to unmap the device page from QEMU's page tables.
  */
-static unsigned long
-kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
+static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
+   unsigned long page_shift)
 {
 
int ret = H_PARAMETER;
@@ -487,9 +486,9 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * H_PAGE_IN_SHARED flag makes the page shared which means that the same
  * memory in is visible from both UV and HV.
  */
-unsigned long
-kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
-unsigned long flags, unsigned long page_shift)
+unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
+   unsigned long flags,
+   unsigned long page_shift)
 {
bool downgrade = false;
unsigned long start, end;
@@ -546,10 +545,10 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Provision a new page on HV side and copy over the contents
  * from secure memory using UV_PAGE_OUT uvcall.
  */
-static int
-kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
-   unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
+static int kvmppc_svm_page_out(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long page_shift,
+   struct kvm *kvm, unsigned long gpa)
 {
unsigned long src_pfn, dst_pfn = 0;
struct migrate_vma mig;
-- 
1.8.3.1

[PATCH v2 0/4] Migrate non-migrated pages of a SVM.

2020-06-18 Thread Ram Pai

This patch series migrates the non-migrated pages of a SVM.
This is required when the UV calls H_SVM_INIT_DONE, and
when a memory-slot is hotplugged to a Secure VM.

Testing: Passed rigorous SVM reboot test using different
sized SVMs.

Changelog:
. fixed a bug observed by Bharata. Pages that
where paged-in and later paged-out must also be
skipped from migration during H_SVM_INIT_DONE.

Laurent Dufour (1):
  KVM: PPC: Book3S HV: migrate hot plugged memory

Ram Pai (3):
  KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
  KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
H_SVM_INIT_DONE

 Documentation/powerpc/ultravisor.rst|   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   8 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c  |   2 +-
 arch/powerpc/kvm/book3s_hv.c|  12 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 449 ++--
 5 files changed, 368 insertions(+), 105 deletions(-)

-- 
1.8.3.1

Re: [PATCH v1 2/4] KVM: PPC: Book3S HV: track shared GFNs of secure VMs

2020-06-05 Thread Ram Pai

> >diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
> >b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >index 803940d..3448459 100644
> >--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >@@ -1100,7 +1100,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
> > unsigned int shift;
> > if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
> >-kvmppc_uvmem_drop_pages(memslot, kvm, true);
> >+kvmppc_uvmem_drop_pages(memslot, kvm, true, false);
> 
> Why purge_gfn is false here?
> That call function is called when dropping an hot plugged memslot.

This function does not know, under what context it is called. Since
its job is to just flush the memslot, it cannot assume anything
about purging the pages in the memslot.

.snip..


RP

Re: [PATCH v1 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-06-03 Thread Ram Pai

On Tue, Jun 02, 2020 at 03:36:39PM +0530, Bharata B Rao wrote:
> On Mon, Jun 01, 2020 at 12:05:35PM -0700, Ram Pai wrote:
> > On Mon, Jun 01, 2020 at 05:25:18PM +0530, Bharata B Rao wrote:
> > > On Sat, May 30, 2020 at 07:27:50PM -0700, Ram Pai wrote:
> > > > H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
> > > > called H_SVM_PAGE_IN for all secure pages.
> > > 
> > > I don't think that is quite true. HV doesn't assume anything about
> > > secure pages by itself.
> > 
> > Yes. Currently, it does not assume anything about secure pages.  But I am
> > proposing that it should consider all pages (except the shared pages) as
> > secure pages, when H_SVM_INIT_DONE is called.
> 
> Ok, then may be also add the proposed changes to H_SVM_INIT_DONE
> documentation.

ok.

> 
> > 
> > In other words, HV should treat all pages; except shared pages, as
> > secure pages once H_SVM_INIT_DONE is called. And this includes pages
> > added subsequently through memory hotplug.
> 
> So after H_SVM_INIT_DONE, if HV touches a secure page for any
> reason and gets encrypted contents via page-out, HV drops the
> device pfn at that time. So what state we would be in that case? We
> have completed H_SVM_INIT_DONE, but still have a normal (but encrypted)
> page in HV?

Good point.

The corresponding GFN will continue to be a secure GFN. Just that its
backing PFN is not a device-PFN, but a memory-PFN. Also that backing
memory-PFN contains encrypted content.

I will clarify this in the patch; about secure-GFN state.

> 
> > 
> > Yes. the Ultravisor can explicitly request the HV to move the pages
> > individually.  But that will slow down the transition too significantly.
> > It takes above 20min to transition them, for a SVM of size 100G.
> > 
> > With this proposed enhancement, the switch completes in a few seconds.
> 
> I think, many pages during initial switch and most pages for hotplugged
> memory are zero pages, for which we don't anyway issue UV page-in calls.
> So the 20min saving you are observing is purely due to hcall overhead?

Apparently, that seems to be the case.

> 
> How about extending H_SVM_PAGE_IN interface or a new hcall to request
> multiple pages in one request?
> 
> Also, how about requesting for bigger page sizes (2M)? Ralph Campbell
> had patches that added THP support for migrate_vma_* calls.

yes. that should give further boost. I think the API does not stop us
from using that feature. Its the support on the Ultravisor side.
Hopefully we will have contributions to the ultravisor once it is
opensourced.

> 
> > 
> > > 
> > > > These GFNs continue to be
> > > > normal GFNs associated with normal PFNs; when infact, these GFNs should
> > > > have been secure GFNs associated with device PFNs.
> > > 
> > > Transition to secure state is driven by SVM/UV and HV just responds to
> > > hcalls by issuing appropriate uvcalls. SVM/UV is in the best position to
> > > determine the required pages that need to be moved into secure side.
> > > HV just responds to it and tracks such pages as device private pages.
> > > 
> > > If SVM/UV doesn't get in all the pages to secure side by the time
> > > of H_SVM_INIT_DONE, the remaining pages are just normal (shared or
> > > otherwise) pages as far as HV is concerned.  Why should HV assume that
> > > SVM/UV didn't ask for a few pages and hence push those pages during
> > > H_SVM_INIT_DONE?
> > 
> > By definition, SVM is a VM backed by secure pages.
> > Hence all pages(except shared) must turn secure when a VM switches to SVM.
> > 
> > UV is interested in only a certain pages for the VM, which it will
> > request explicitly through H_SVM_PAGE_IN.  All other pages, need not
> > be paged-in through UV_PAGE_IN.  They just need to be switched to
> > device-pages.
> > 
> > > 
> > > I think UV should drive the movement of pages into secure side both
> > > of boot-time SVM memory and hot-plugged memory. HV does memslot
> > > registration uvcall when new memory is plugged in, UV should explicitly
> > > get the required pages in at that time instead of expecting HV to drive
> > > the same.
> > > 
> > > > +static int uv_migrate_mem_slot(struct kvm *kvm,
> > > > +   const struct kvm_memory_slot *memslot)
> > > > +{
> > > > +   unsigned long gfn = memslot->base_gfn;
> > > > +   unsigned long end;
> > > > +   bool downgrade = false;
> > > > +   struct vm_area_struct *vma;
>

Re: [PATCH v1 4/4] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-06-03 Thread Ram Pai

On Tue, Jun 02, 2020 at 10:31:32AM +0200, Laurent Dufour wrote:
> Le 31/05/2020 à 04:27, Ram Pai a écrit :
> >From: Laurent Dufour 
> >
> >When a memory slot is hot plugged to a SVM, GFNs associated with that
> >memory slot automatically default to secure GFN. Hence migrate the
> >PFNs associated with these GFNs to device-PFNs.
> >
> >uv_migrate_mem_slot() is called to achieve that. It will not call
> >UV_PAGE_IN since this request is ignored by the Ultravisor.
> >NOTE: Ultravisor does not trust any page content provided by
> >the Hypervisor, ones the VM turns secure.
> >
> >Cc: Paul Mackerras 
> >Cc: Benjamin Herrenschmidt 
> >Cc: Michael Ellerman 
> >Cc: Bharata B Rao 
> >Cc: Aneesh Kumar K.V 
> >Cc: Sukadev Bhattiprolu 
> >Cc: Laurent Dufour 
> >Cc: Thiago Jung Bauermann 
> >Cc: David Gibson 
> >Cc: Claudio Carvalho 
> >Cc: kvm-...@vger.kernel.org
> >Cc: linuxppc-dev@lists.ozlabs.org
> >Signed-off-by: Ram Pai 
> > (fixed merge conflicts. Modified the commit message)
> >Signed-off-by: Laurent Dufour 
> >---
> >  arch/powerpc/include/asm/kvm_book3s_uvmem.h |  4 
> >  arch/powerpc/kvm/book3s_hv.c| 11 +++
> >  arch/powerpc/kvm/book3s_hv_uvmem.c  |  3 +--
> >  3 files changed, 12 insertions(+), 6 deletions(-)
> >
> >diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
> >b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> >index f0c5708..2ec2e5afb 100644
> >--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> >+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> >@@ -23,6 +23,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
> >  void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> >  struct kvm *kvm, bool skip_page_out,
> >  bool purge_gfn);
> >+int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot 
> >*memslot);
> >  #else
> >  static inline int kvmppc_uvmem_init(void)
> >  {
> >@@ -78,5 +79,8 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
> >unsigned long gfn)
> >  kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> > struct kvm *kvm, bool skip_page_out,
> > bool purge_gfn) { }
> >+
> >+static int uv_migrate_mem_slot(struct kvm *kvm,
> >+const struct kvm_memory_slot *memslot);
> 
> That line was not part of the patch I sent to you!

Your patch is rebased on top of my patches. This prototype declaration
is for the ifndef CONFIG_PPC_UV   case.

> 
> 
> >  #endif /* CONFIG_PPC_UV */
> >  #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
> >diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> >index 4c62bfe..604d062 100644
> >--- a/arch/powerpc/kvm/book3s_hv.c
> >+++ b/arch/powerpc/kvm/book3s_hv.c
> >@@ -4516,13 +4516,16 @@ static void 
> >kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
> > case KVM_MR_CREATE:
> > if (kvmppc_uvmem_slot_init(kvm, new))
> > return;
> >-uv_register_mem_slot(kvm->arch.lpid,
> >- new->base_gfn << PAGE_SHIFT,
> >- new->npages * PAGE_SIZE,
> >- 0, new->id);
> >+if (uv_register_mem_slot(kvm->arch.lpid,
> >+ new->base_gfn << PAGE_SHIFT,
> >+ new->npages * PAGE_SIZE,
> >+ 0, new->id))
> >+return;
> >+uv_migrate_mem_slot(kvm, new);
> > break;
> > case KVM_MR_DELETE:
> > uv_unregister_mem_slot(kvm->arch.lpid, old->id);
> >+kvmppc_uvmem_drop_pages(old, kvm, true, true);
> 
> Again that line has been changed from the patch I sent to you. The
> last 'true' argument has nothing to do here.

yes. i did add another parameter to kvmppc_uvmem_drop_pages() in my
patch series. So had to adapt your patch to operate on top my mine.
> 
> Is that series really building?

yes. it built for me.

RP

Re: [PATCH v1 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-06-01 Thread Ram Pai

On Mon, Jun 01, 2020 at 05:25:18PM +0530, Bharata B Rao wrote:
> On Sat, May 30, 2020 at 07:27:50PM -0700, Ram Pai wrote:
> > H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
> > called H_SVM_PAGE_IN for all secure pages.
> 
> I don't think that is quite true. HV doesn't assume anything about
> secure pages by itself.

Yes. Currently, it does not assume anything about secure pages.  But I am
proposing that it should consider all pages (except the shared pages) as
secure pages, when H_SVM_INIT_DONE is called.

In other words, HV should treat all pages; except shared pages, as
secure pages once H_SVM_INIT_DONE is called. And this includes pages
added subsequently through memory hotplug.

Yes. the Ultravisor can explicitly request the HV to move the pages
individually.  But that will slow down the transition too significantly.
It takes above 20min to transition them, for a SVM of size 100G.

With this proposed enhancement, the switch completes in a few seconds.

> 
> > These GFNs continue to be
> > normal GFNs associated with normal PFNs; when infact, these GFNs should
> > have been secure GFNs associated with device PFNs.
> 
> Transition to secure state is driven by SVM/UV and HV just responds to
> hcalls by issuing appropriate uvcalls. SVM/UV is in the best position to
> determine the required pages that need to be moved into secure side.
> HV just responds to it and tracks such pages as device private pages.
> 
> If SVM/UV doesn't get in all the pages to secure side by the time
> of H_SVM_INIT_DONE, the remaining pages are just normal (shared or
> otherwise) pages as far as HV is concerned.  Why should HV assume that
> SVM/UV didn't ask for a few pages and hence push those pages during
> H_SVM_INIT_DONE?

By definition, SVM is a VM backed by secure pages.
Hence all pages(except shared) must turn secure when a VM switches to SVM.

UV is interested in only a certain pages for the VM, which it will
request explicitly through H_SVM_PAGE_IN.  All other pages, need not
be paged-in through UV_PAGE_IN.  They just need to be switched to
device-pages.

> 
> I think UV should drive the movement of pages into secure side both
> of boot-time SVM memory and hot-plugged memory. HV does memslot
> registration uvcall when new memory is plugged in, UV should explicitly
> get the required pages in at that time instead of expecting HV to drive
> the same.
> 
> > +static int uv_migrate_mem_slot(struct kvm *kvm,
> > +   const struct kvm_memory_slot *memslot)
> > +{
> > +   unsigned long gfn = memslot->base_gfn;
> > +   unsigned long end;
> > +   bool downgrade = false;
> > +   struct vm_area_struct *vma;
> > +   int i, ret = 0;
> > +   unsigned long start = gfn_to_hva(kvm, gfn);
> > +
> > +   if (kvm_is_error_hva(start))
> > +   return H_STATE;
> > +
> > +   end = start + (memslot->npages << PAGE_SHIFT);
> > +
> > +   down_write(>mm->mmap_sem);
> > +
> > +   mutex_lock(>arch.uvmem_lock);
> > +   vma = find_vma_intersection(kvm->mm, start, end);
> > +   if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +   ret = H_STATE;
> > +   goto out_unlock;
> > +   }
> > +
> > +   ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > + MADV_UNMERGEABLE, >vm_flags);
> > +   downgrade_write(>mm->mmap_sem);
> > +   downgrade = true;
> > +   if (ret) {
> > +   ret = H_STATE;
> > +   goto out_unlock;
> > +   }
> > +
> > +   for (i = 0; i < memslot->npages; i++, ++gfn) {
> > +   /* skip paged-in pages and shared pages */
> > +   if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL) ||
> > +   kvmppc_gfn_is_uvmem_shared(gfn, kvm))
> > +   continue;
> > +
> > +   start = gfn_to_hva(kvm, gfn);
> > +   end = start + (1UL << PAGE_SHIFT);
> > +   ret = kvmppc_svm_migrate_page(vma, start, end,
> > +   (gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> > +
> > +   if (ret)
> > +   goto out_unlock;
> > +   }
> 
> Is there a guarantee that the vma you got for the start address remains
> valid for all the addresses till end in a memslot? If not, you should
> re-get the vma for the current address in each iteration I suppose.


mm->mmap_sem  is the semaphore that guards the vma. right?  If that
semaphore is held, can the vma change?


Thanks for your comments,
RP

[PATCH v1 4/4] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-05-30 Thread Ram Pai

From: Laurent Dufour 

When a memory slot is hot plugged to a SVM, GFNs associated with that
memory slot automatically default to secure GFN. Hence migrate the
PFNs associated with these GFNs to device-PFNs.

uv_migrate_mem_slot() is called to achieve that. It will not call
UV_PAGE_IN since this request is ignored by the Ultravisor.
NOTE: Ultravisor does not trust any page content provided by
the Hypervisor, ones the VM turns secure.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
(fixed merge conflicts. Modified the commit message)
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  4 
 arch/powerpc/kvm/book3s_hv.c| 11 +++
 arch/powerpc/kvm/book3s_hv_uvmem.c  |  3 +--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index f0c5708..2ec2e5afb 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 struct kvm *kvm, bool skip_page_out,
 bool purge_gfn);
+int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot 
*memslot);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -78,5 +79,8 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
struct kvm *kvm, bool skip_page_out,
bool purge_gfn) { }
+
+static int uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot);
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4c62bfe..604d062 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4516,13 +4516,16 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
case KVM_MR_CREATE:
if (kvmppc_uvmem_slot_init(kvm, new))
return;
-   uv_register_mem_slot(kvm->arch.lpid,
-new->base_gfn << PAGE_SHIFT,
-new->npages * PAGE_SIZE,
-0, new->id);
+   if (uv_register_mem_slot(kvm->arch.lpid,
+new->base_gfn << PAGE_SHIFT,
+new->npages * PAGE_SIZE,
+0, new->id))
+   return;
+   uv_migrate_mem_slot(kvm, new);
break;
case KVM_MR_DELETE:
uv_unregister_mem_slot(kvm->arch.lpid, old->id);
+   kvmppc_uvmem_drop_pages(old, kvm, true, true);
kvmppc_uvmem_slot_free(kvm, old);
break;
default:
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 36dda1d..1fa5f2a 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -377,8 +377,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct 
*vma,
return ret;
 }
 
-static int uv_migrate_mem_slot(struct kvm *kvm,
-   const struct kvm_memory_slot *memslot)
+int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
 {
unsigned long gfn = memslot->base_gfn;
unsigned long end;
-- 
1.8.3.1

[PATCH v1 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-05-30 Thread Ram Pai

H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
called H_SVM_PAGE_IN for all secure pages. These GFNs continue to be
normal GFNs associated with normal PFNs; when infact, these GFNs should
have been secure GFNs associated with device PFNs.

Move all the PFN associated with the SVM's GFNs to device PFNs, in
H_SVM_INIT_DONE. Skip the GFNs that are already Paged-in or Shared
through H_SVM_PAGE_IN.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 Documentation/powerpc/ultravisor.rst |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 219 ---
 2 files changed, 154 insertions(+), 67 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index 363736d..3bc8957 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -933,6 +933,8 @@ Return values
* H_UNSUPPORTED if called from the wrong context (e.g.
from an SVM or before an H_SVM_INIT_START
hypercall).
+   * H_STATE   if the hypervisor could not successfully
+transition the VM to Secure VM.
 
 Description
 ~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 2ef1e03..36dda1d 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -318,14 +318,149 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
return ret;
 }
 
+static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm);
+
+/*
+ * Alloc a PFN from private device memory pool. If @pagein is true,
+ * copy page from normal memory to secure memory using UV_PAGE_IN uvcall.
+ */
+static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long gpa, struct kvm *kvm,
+   unsigned long page_shift,
+   bool pagein)
+{
+   unsigned long src_pfn, dst_pfn = 0;
+   struct migrate_vma mig;
+   struct page *dpage;
+   struct page *spage;
+   unsigned long pfn;
+   int ret = 0;
+
+   memset(, 0, sizeof(mig));
+   mig.vma = vma;
+   mig.start = start;
+   mig.end = end;
+   mig.src = _pfn;
+   mig.dst = _pfn;
+
+   ret = migrate_vma_setup();
+   if (ret)
+   return ret;
+
+   if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
+   ret = -1;
+   goto out_finalize;
+   }
+
+   dpage = kvmppc_uvmem_get_page(gpa, kvm);
+   if (!dpage) {
+   ret = -1;
+   goto out_finalize;
+   }
+
+   if (pagein) {
+   pfn = *mig.src >> MIGRATE_PFN_SHIFT;
+   spage = migrate_pfn_to_page(*mig.src);
+   if (spage) {
+   ret = uv_page_in(kvm->arch.lpid, pfn << page_shift,
+   gpa, 0, page_shift);
+   if (ret)
+   goto out_finalize;
+   }
+   }
+
+   *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   migrate_vma_pages();
+out_finalize:
+   migrate_vma_finalize();
+   return ret;
+}
+
+static int uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot)
+{
+   unsigned long gfn = memslot->base_gfn;
+   unsigned long end;
+   bool downgrade = false;
+   struct vm_area_struct *vma;
+   int i, ret = 0;
+   unsigned long start = gfn_to_hva(kvm, gfn);
+
+   if (kvm_is_error_hva(start))
+   return H_STATE;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+
+   down_write(>mm->mmap_sem);
+
+   mutex_lock(>arch.uvmem_lock);
+   vma = find_vma_intersection(kvm->mm, start, end);
+   if (!vma || vma->vm_start > start || vma->vm_end < end) {
+   ret = H_STATE;
+   goto out_unlock;
+   }
+
+   ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+ MADV_UNMERGEABLE, >vm_flags);
+   downgrade_write(>mm->mmap_sem);
+   downgrade = true;
+   if (ret) {
+   ret = H_STATE;
+   goto out_unlock;
+   }
+
+   for (i = 0; i < memslot->npages; i++, ++gfn) {
+   /* skip paged-in pages and shared pages */
+   if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL) ||
+   kvmppc_gfn_is_uvmem_shared(gfn, kvm))
+   continue;
+
+   start = gfn_to_hva(kvm, gfn);
+   end = start + (

[PATCH v1 2/4] KVM: PPC: Book3S HV: track shared GFNs of secure VMs

2020-05-30 Thread Ram Pai

During the life of SVM, its GFNs can transition from secure to shared
state and vice-versa. Since the kernel does not track GFNs that are
shared, it is not possible to disambiguate a shared GFN from a GFN whose
PFN has not yet been migrated to a device-PFN.

The ability to identify a shared GFN is needed to skip migrating its PFN
to device PFN. This functionality is leveraged in a subsequent patch.

Add the ability to identify the state of a GFN.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c  |   2 +-
 arch/powerpc/kvm/book3s_hv.c|   2 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 115 ++--
 4 files changed, 113 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 5a9834e..f0c5708 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -21,7 +21,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-struct kvm *kvm, bool skip_page_out);
+struct kvm *kvm, bool skip_page_out,
+bool purge_gfn);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -75,6 +76,7 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
 
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-   struct kvm *kvm, bool skip_page_out) { }
+   struct kvm *kvm, bool skip_page_out,
+   bool purge_gfn) { }
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 803940d..3448459 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1100,7 +1100,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
unsigned int shift;
 
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
-   kvmppc_uvmem_drop_pages(memslot, kvm, true);
+   kvmppc_uvmem_drop_pages(memslot, kvm, true, false);
 
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
return;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 103d13e..4c62bfe 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5467,7 +5467,7 @@ static int kvmhv_svm_off(struct kvm *kvm)
continue;
 
kvm_for_each_memslot(memslot, slots) {
-   kvmppc_uvmem_drop_pages(memslot, kvm, true);
+   kvmppc_uvmem_drop_pages(memslot, kvm, true, true);
uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
}
}
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index ea4a1f1..2ef1e03 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -99,14 +99,56 @@
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
 #define KVMPPC_UVMEM_PFN   (1UL << 63)
+#define KVMPPC_UVMEM_SHARED(1UL << 62)
+#define KVMPPC_UVMEM_FLAG_MASK (KVMPPC_UVMEM_PFN | KVMPPC_UVMEM_SHARED)
+#define KVMPPC_UVMEM_PFN_MASK  (~KVMPPC_UVMEM_FLAG_MASK)
 
 struct kvmppc_uvmem_slot {
struct list_head list;
unsigned long nr_pfns;
unsigned long base_pfn;
+   /*
+* pfns array has an entry for each GFN of the memory slot.
+*
+* The GFN can be in one of the following states.
+*
+* (a) Secure - The GFN is secure. Only Ultravisor can access it.
+* (b) Shared - The GFN is shared. Both Hypervisor and Ultravisor
+*  can access it.
+* (c) Normal - The GFN is a normal.  Only Hypervisor can access it.
+*
+* Secure GFN is associated with a devicePFN. Its pfn[] has
+* KVMPPC_UVMEM_PFN flag set, and has the value of the device PFN
+* KVMPPC_UVMEM_SHARED flag unset, and has the value of the device PFN
+*
+* Shared GFN is associated with a memoryPFN. Its pfn[] has
+* KVMPPC_UVMEM_SHARED flag set. But its KVMPPC_UVMEM_PFN is not set,
+* and there is no PFN value stored.
+*
+* Normal GFN is not associated with memoryPFN. Its pfn[] ha

[PATCH v1 1/4] KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c

2020-05-30 Thread Ram Pai

Without this fix, GIT gets confused. It generates incorrect
function context for code changes.  Weird, but true.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 79b1202..ea4a1f1 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -368,8 +368,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Alloc a PFN from private device memory pool and copy page from normal
  * memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int
-kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
+static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, unsigned long gpa, struct kvm *kvm,
   unsigned long page_shift, bool *downgrade)
 {
@@ -436,8 +435,8 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
  * to unmap the device page from QEMU's page tables.
  */
-static unsigned long
-kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
+static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
+   unsigned long page_shift)
 {
 
int ret = H_PARAMETER;
@@ -486,9 +485,9 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * H_PAGE_IN_SHARED flag makes the page shared which means that the same
  * memory in is visible from both UV and HV.
  */
-unsigned long
-kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
-unsigned long flags, unsigned long page_shift)
+unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
+   unsigned long flags,
+   unsigned long page_shift)
 {
bool downgrade = false;
unsigned long start, end;
@@ -545,10 +544,10 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  * Provision a new page on HV side and copy over the contents
  * from secure memory using UV_PAGE_OUT uvcall.
  */
-static int
-kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
-   unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
+static int kvmppc_svm_page_out(struct vm_area_struct *vma,
+   unsigned long start,
+   unsigned long end, unsigned long page_shift,
+   struct kvm *kvm, unsigned long gpa)
 {
unsigned long src_pfn, dst_pfn = 0;
struct migrate_vma mig;
-- 
1.8.3.1

[PATCH v1 0/4] Migrate non-migrated pages of a SVM.

2020-05-30 Thread Ram Pai

This patch series migrates the non-migrate pages of a SVM.
This is required when the UV calls H_SVM_INIT_DONE, and
when a memory-slot is hotplugged to a Secure VM.

Laurent Dufour (1):
  KVM: PPC: Book3S HV: migrate hot plugged memory

Ram Pai (3):
  KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
  KVM: PPC: Book3S HV: track shared GFNs of secure VMs
  KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
H_SVM_INIT_DONE

 Documentation/powerpc/ultravisor.rst|   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  10 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c  |   2 +-
 arch/powerpc/kvm/book3s_hv.c|  13 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 348 +---
 5 files changed, 284 insertions(+), 91 deletions(-)

-- 
1.8.3.1

Re: [PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-21 Thread Ram Pai

On Wed, May 20, 2020 at 07:43:08PM +0200, Laurent Dufour wrote:
> The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
> Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
> reserved to the Ultravisor.
> 
> However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
> context of the VM calling UV_ESM. This allows the Hypervisor to return to
> the guest without going through the Ultravisor. Thus the Secure bit of SRR1
> is not set in that particular case.
> 
> In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
> filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
> not set in that case.
> 
> Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
> Signed-off-by: Laurent Dufour 


Reviewed-by: Ram Pai 

> ---
>  arch/powerpc/kvm/book3s_hv.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 93493f0cbfe8..6ad1a3b14300 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>   ret = kvmppc_h_svm_init_done(vcpu->kvm);
>   break;
>   case H_SVM_INIT_ABORT:
> - ret = H_UNSUPPORTED;
> - if (kvmppc_get_srr1(vcpu) & MSR_S)
> - ret = kvmppc_h_svm_init_abort(vcpu->kvm);
> + /*
> +  * Even if that call is made by the Ultravisor, the SSR1 value
> +  * is the guest context one, with the secure bit clear as it has
> +  * not yet been secured. So we can't check it here.
> +  */

Frankly speaking, the comment above when read in isolation; i.e without
the delete code above, feels out of place.  The reasoning for change is
anyway captured in the changelog.  So, I think, we should delete this
comment.

Also the comment above assumes the Ultravisor will call H_SVM_INIT_ABORT
with SRR1(S) bit not set; which may or may not be true.  Regardless of
who and how H_SVM_INIT_ABORT is called, we should just call
kvmppc_h_svm_init_abort() and let it deal with the complexities.


RP

[PATCH v3] powerpc/XIVE: SVM: share the event-queue page with the Hypervisor.

2020-04-26 Thread Ram Pai

XIVE interrupt controller uses an Event Queue (EQ) to enqueue event
notifications when an exception occurs. The EQ is a single memory page
provided by the O/S defining a circular buffer, one per server and
priority couple.

On baremetal, the EQ page is configured with an OPAL call. On pseries,
an extra hop is necessary and the guest OS uses the hcall
H_INT_SET_QUEUE_CONFIG to configure the XIVE interrupt controller.

The XIVE controller being Hypervisor privileged, it will not be allowed
to enqueue event notifications for a Secure VM unless the EQ pages are
shared by the Secure VM.

Hypervisor/Ultravisor still requires support for the TIMA and ESB page
fault handlers. Until this is complete, QEMU can use the emulated XIVE
device for Secure VMs, option "kernel_irqchip=off" on the QEMU pseries
machine.

Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Thiago Jung Bauermann 
Cc: Michael Anderson 
Cc: Sukadev Bhattiprolu 
Cc: Alexey Kardashevskiy 
Cc: Paul Mackerras 
Cc: David Gibson 
Reviewed-by: Cedric Le Goater 
Reviewed-by: Greg Kurz 
Signed-off-by: Ram Pai 

v3: fix a minor semantics in description.
and added reviewed-by from Cedric and Greg.
v2: better description of the patch from Cedric.
---
 arch/powerpc/sysdev/xive/spapr.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index 55dc61c..608b52f 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "xive-internal.h"
 
@@ -501,6 +503,9 @@ static int xive_spapr_configure_queue(u32 target, struct 
xive_q *q, u8 prio,
rc = -EIO;
} else {
q->qpage = qpage;
+   if (is_secure_guest())
+   uv_share_page(PHYS_PFN(qpage_phys),
+   1 << xive_alloc_order(order));
}
 fail:
return rc;
@@ -534,6 +539,8 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, 
struct xive_cpu *xc,
   hw_cpu, prio);
 
alloc_order = xive_alloc_order(xive_queue_shift);
+   if (is_secure_guest())
+   uv_unshare_page(PHYS_PFN(__pa(q->qpage)), 1 << alloc_order);
free_pages((unsigned long)q->qpage, alloc_order);
q->qpage = NULL;
 }
-- 
1.8.3.1

[PATCH v3] powerpc/XIVE: SVM: share the event-queue page with the Hypervisor.

2020-04-25 Thread Ram Pai

>From 10ea2eaf492ca3f22f67a5a63a2b7865e45299ad Mon Sep 17 00:00:00 2001
From: Ram Pai 
Date: Mon, 24 Feb 2020 01:09:48 -0500
Subject: [PATCH v3] powerpc/XIVE: SVM: share the event-queue page with the
 Hypervisor.

XIVE interrupt controller uses an Event Queue (EQ) to enqueue event
notifications when an exception occurs. The EQ is a single memory page
provided by the O/S defining a circular buffer, one per server and
priority couple.

On baremetal, the EQ page is configured with an OPAL call. On pseries,
an extra hop is necessary and the guest OS uses the hcall
H_INT_SET_QUEUE_CONFIG to configure the XIVE interrupt controller.

The XIVE controller being Hypervisor privileged, it will not be allowed
to enqueue event notifications for a Secure VM unless the EQ pages are
shared by the Secure VM.

Hypervisor/Ultravisor still requires support for the TIMA and ESB page
fault handlers. Until this is complete, QEMU can use the emulated XIVE
device for Secure VMs, option "kernel_irqchip=off" on the QEMU pseries
machine.

Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Thiago Jung Bauermann 
Cc: Michael Anderson 
Cc: Sukadev Bhattiprolu 
Cc: Alexey Kardashevskiy 
Cc: Paul Mackerras 
Cc: David Gibson 
Reviewed-by: Cedric Le Goater 
Reviewed-by: Greg Kurz 
Signed-off-by: Ram Pai 

v3: fix a minor semantics in description.
and added reviewed-by from Cedric and Greg.
v2: better description of the patch from Cedric.
---
 arch/powerpc/sysdev/xive/spapr.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index 55dc61c..608b52f 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include "xive-internal.h"

@@ -501,6 +503,9 @@ static int xive_spapr_configure_queue(u32 target, struct 
xive_q *q, u8 prio,
rc = -EIO;
} else {
q->qpage = qpage;
+   if (is_secure_guest())
+   uv_share_page(PHYS_PFN(qpage_phys),
+   1 << xive_alloc_order(order));
}
 fail:
return rc;
@@ -534,6 +539,8 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, 
struct xive_cpu *xc,
   hw_cpu, prio);

alloc_order = xive_alloc_order(xive_queue_shift);
+   if (is_secure_guest())
+   uv_unshare_page(PHYS_PFN(__pa(q->qpage)), 1 << alloc_order);
free_pages((unsigned long)q->qpage, alloc_order);
q->qpage = NULL;
 }
-- 
1.8.3.1

Re: [PATCH v2 01/22] powerpc/pkeys: Avoid using lockless page table walk

2020-04-02 Thread Ram Pai

On Thu, Mar 19, 2020 at 09:25:48AM +0530, Aneesh Kumar K.V wrote:
> Fetch pkey from vma instead of linux page table. Also document the fact that 
> in
> some cases the pkey returned in siginfo won't be the same as the one we took
> keyfault on. Even with linux page table walk, we can end up in a similar 
> scenario.

There is no way to correctly ensure that the key returned through
siginfo is actually the key that took the fault.  Either get it
from page table or get it from the corresponding vma.

So we had to choose the lesser evil. Getting it from the page table was
faster, and did not involve taking any locks.  Getting it from the vma
was slower, since it needed locks.  Also I faintly recall, there
is a scenario where the address that gets a key fault, has no
corresponding VMA associated with it yet.

Hence the logic used was --
if it is key-fault, than procure the key quickly
from the page table.  In the unlikely event that the fault is
something else, but still has a non-permissive key associated
with it, get the key from the vma.

A well written application should avoid changing the key of an address
space without synchronizing the corresponding threads that operate in
that address range.  However, if the application ignores to do so, than
it is vulnerable to a undefined behavior. There is no way to prove that
the reported key is correct or incorrect, since there is no provable
order between the two events; the key-fault event and the key-change
event.

Hence I think the change proposed in this patch may not be necessary.
RP

> 
> Cc: Ram Pai 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/mmu.h|  9 ---
>  arch/powerpc/mm/book3s64/hash_utils.c | 24 
>  arch/powerpc/mm/fault.c   | 83 +++
>  3 files changed, 60 insertions(+), 56 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> index 0699cfeeb8c9..cf2a08bfd5cd 100644
> --- a/arch/powerpc/include/asm/mmu.h
> +++ b/arch/powerpc/include/asm/mmu.h
> @@ -291,15 +291,6 @@ static inline bool early_radix_enabled(void)
>  }
>  #endif
>  
> -#ifdef CONFIG_PPC_MEM_KEYS
> -extern u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address);
> -#else
> -static inline u16 get_mm_addr_key(struct mm_struct *mm, unsigned long 
> address)
> -{
> - return 0;
> -}
> -#endif /* CONFIG_PPC_MEM_KEYS */
> -
>  #ifdef CONFIG_STRICT_KERNEL_RWX
>  static inline bool strict_kernel_rwx_enabled(void)
>  {
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 523d4d39d11e..8530ddbba56f 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1670,30 +1670,6 @@ void update_mmu_cache(struct vm_area_struct *vma, 
> unsigned long address,
>   hash_preload(vma->vm_mm, address, is_exec, trap);
>  }
>  
> -#ifdef CONFIG_PPC_MEM_KEYS
> -/*
> - * Return the protection key associated with the given address and the
> - * mm_struct.
> - */
> -u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address)
> -{
> - pte_t *ptep;
> - u16 pkey = 0;
> - unsigned long flags;
> -
> - if (!mm || !mm->pgd)
> - return 0;
> -
> - local_irq_save(flags);
> - ptep = find_linux_pte(mm->pgd, address, NULL, NULL);
> - if (ptep)
> - pkey = pte_to_pkey_bits(pte_val(READ_ONCE(*ptep)));
> - local_irq_restore(flags);
> -
> - return pkey;
> -}
> -#endif /* CONFIG_PPC_MEM_KEYS */
> -
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  static inline void tm_flush_hash_page(int local)
>  {
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 8db0507619e2..ab99ffa7d946 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -118,9 +118,34 @@ static noinline int bad_area(struct pt_regs *regs, 
> unsigned long address)
>   return __bad_area(regs, address, SEGV_MAPERR);
>  }
>  
> -static int bad_key_fault_exception(struct pt_regs *regs, unsigned long 
> address,
> - int pkey)
> +#ifdef CONFIG_PPC_MEM_KEYS
> +static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long 
> address,
> + struct vm_area_struct *vma)
>  {
> + struct mm_struct *mm = current->mm;
> + int pkey;
> +
> + /*
> +  * We don't try to fetch the pkey from page table because reading
> +  * page table without locking doesn't guarantee stable pte value.
> +  * Hence the pkey value that we return to userspace can be different
> +  * from the pkey that actually caused access error.
> +  *
>

Re: [PATCH v2] powerpc/XIVE: SVM: share the event-queue page with the Hypervisor.

2020-03-31 Thread Ram Pai

On Tue, Mar 31, 2020 at 08:53:07PM -0300, Thiago Jung Bauermann wrote:
> 
> Hi Ram,
> 
> Ram Pai  writes:
> 
> > diff --git a/arch/powerpc/sysdev/xive/spapr.c 
> > b/arch/powerpc/sysdev/xive/spapr.c
> > index 55dc61c..608b52f 100644
> > --- a/arch/powerpc/sysdev/xive/spapr.c
> > +++ b/arch/powerpc/sysdev/xive/spapr.c
> > @@ -26,6 +26,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  
> >  #include "xive-internal.h"
> >  
> > @@ -501,6 +503,9 @@ static int xive_spapr_configure_queue(u32 target, 
> > struct xive_q *q, u8 prio,
> > rc = -EIO;
> > } else {
> > q->qpage = qpage;
> > +   if (is_secure_guest())
> > +   uv_share_page(PHYS_PFN(qpage_phys),
> > +   1 << xive_alloc_order(order));
> 
> If I understand this correctly, you're passing the number of bytes of
> the queue to uv_share_page(), but that ultracall expects the number of
> pages to be shared.


static inline u32 xive_alloc_order(u32 queue_shift)
{
return (queue_shift > PAGE_SHIFT) ? (queue_shift - PAGE_SHIFT) : 0;
}

xive_alloc_order(order) returns the order of PAGE_SIZE pages.
Hence the value passed to uv_shared_pages is the number of pages,
and not the number of bytes.

BTW: I did verify through testing that it was indeed passing 1 page to the
uv_share_page().  

RP

[PATCH v2] powerpc/XIVE: SVM: share the event-queue page with the Hypervisor.

2020-03-26 Thread Ram Pai

XIVE interrupt controller use an Event Queue (EQ) to enqueue event
notifications when an exception occurs. The EQ is a single memory page
provided by the O/S defining a circular buffer, one per server and
priority couple.

On baremetal, the EQ page is configured with an OPAL call. On pseries,
an extra hop is necessary and the guest OS uses the hcall
H_INT_SET_QUEUE_CONFIG to configure the XIVE interrupt controller.

The XIVE controller being Hypervisor privileged, it will not be allowed
to enqueue event notifications for a Secure VM unless the EQ pages are
shared by the Secure VM.

Hypervisor/Ultravisor still requires support for the TIMA and ESB page
fault handlers. Until this is complete, QEMU can use the emulated XIVE
device for Secure VMs, option "kernel_irqchip=off" on the QEMU pseries
machine.

Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Thiago Jung Bauermann 
Cc: Michael Anderson 
Cc: Sukadev Bhattiprolu 
Cc: Alexey Kardashevskiy 
Cc: Paul Mackerras 
Cc: Greg Kurz 
Cc: Cedric Le Goater 
Cc: David Gibson 
Signed-off-by: Ram Pai 

v2: better description of the patch from Cedric.
---
 arch/powerpc/sysdev/xive/spapr.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index 55dc61c..608b52f 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "xive-internal.h"
 
@@ -501,6 +503,9 @@ static int xive_spapr_configure_queue(u32 target, struct 
xive_q *q, u8 prio,
rc = -EIO;
} else {
q->qpage = qpage;
+   if (is_secure_guest())
+   uv_share_page(PHYS_PFN(qpage_phys),
+   1 << xive_alloc_order(order));
}
 fail:
return rc;
@@ -534,6 +539,8 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, 
struct xive_cpu *xc,
   hw_cpu, prio);
 
alloc_order = xive_alloc_order(xive_queue_shift);
+   if (is_secure_guest())
+   uv_unshare_page(PHYS_PFN(__pa(q->qpage)), 1 << alloc_order);
free_pages((unsigned long)q->qpage, alloc_order);
q->qpage = NULL;
 }
-- 
1.8.3.1

Re: [PATCH] powerpc/prom_init: Include the termination message in ibm,os-term RTAS call

2020-03-25 Thread Ram Pai

On Tue, Mar 24, 2020 at 05:12:11PM -0300, Fabiano Rosas wrote:
> QEMU can now print the ibm,os-term message[1], so let's include it in
> the RTAS call. E.g.:
> 
>   qemu-system-ppc64: OS terminated: Switch to secure mode failed.
> 
> 1- https://git.qemu.org/?p=qemu.git;a=commitdiff;h=a4c3791ae0
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  arch/powerpc/kernel/prom_init.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 577345382b23..d543fb6d29c5 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -1773,6 +1773,9 @@ static void __init prom_rtas_os_term(char *str)
>   if (token == 0)
>   prom_panic("Could not get token for ibm,os-term\n");
>   os_term_args.token = cpu_to_be32(token);
> + os_term_args.nargs = cpu_to_be32(1);
> + os_term_args.args[0] = cpu_to_be32(__pa(str));
> +

Reviewed-by: Ram Pai 

RP

Re: [PATCH 2/2] KVM: PPC: Book3S HV: H_SVM_INIT_START must call UV_RETURN

2020-03-20 Thread Ram Pai

On Fri, Mar 20, 2020 at 11:26:43AM +0100, Laurent Dufour wrote:
> When the call to UV_REGISTER_MEM_SLOT is failing, for instance because
> there is not enough free secured memory, the Hypervisor (HV) has to call
   secure memory,

> UV_RETURN to report the error to the Ultravisor (UV). Then the UV will call
> H_SVM_INIT_ABORT to abort the securing phase and go back to the calling VM.
> 
> If the kvm->arch.secure_guest is not set, in the return path rfid is called
> but there is no valid context to get back to the SVM since the Hcall has
> been routed by the Ultravisor.
> 
> Move the setting of kvm->arch.secure_guest earlier in
> kvmppc_h_svm_init_start() so in the return path, UV_RETURN will be called
> instead of rfid.
> 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Michael Ellerman 
> Signed-off-by: Laurent Dufour 

Reviewed-by: Ram Pai

Re: [PATCH 1/2] KVM: PPC: Book3S HV: check caller of H_SVM_* Hcalls

2020-03-20 Thread Ram Pai

On Fri, Mar 20, 2020 at 11:26:42AM +0100, Laurent Dufour wrote:
> The Hcall named H_SVM_* are reserved to the Ultravisor. However, nothing
> prevent a malicious VM or SVM to call them. This could lead to weird result
> and should be filtered out.
> 
> Checking the Secure bit of the calling MSR ensure that the call is coming
> from either the Ultravisor or a SVM. But any system call made from a SVM
> are going through the Ultravisor, and the Ultravisor should filter out
> these malicious call. This way, only the Ultravisor is able to make such a
> Hcall.
> 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Michael Ellerman 
> Signed-off-by: Laurent Dufour 

Reviewed-by: Ram Pai 

> ---
>  arch/powerpc/kvm/book3s_hv.c | 32 +---
>  1 file changed, 21 insertions(+), 11 deletions(-)
>

[RFC PATCH v1] powerpc/XIVE: SVM: share the event-queue page with the Hypervisor.

2020-03-15 Thread Ram Pai

XIVE interrupt controller maintains a Event-Queue(EQ) page. This page is
used to communicate events with the Hypervisor/Qemu. In Secure-VM,
unless a page is shared with the Hypervisor, the Hypervisor will
not be able to read/write to that page.

Explicitly share the EQ page with the Hypervisor, and unshare it
during cleanup.  This enables SVM to use XIVE.

(NOTE: If the Hypervisor/Ultravisor is unable to target interrupts
 directly to Secure VM, use "kernel_irqchip=off" on the qemu command
 line).

Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Thiago Jung Bauermann 
Cc: Michael Anderson 
Cc: Sukadev Bhattiprolu 
Cc: Alexey Kardashevskiy 
Cc: Paul Mackerras 
Cc: Greg Kurz 
Cc: Cedric Le Goater 
Cc: David Gibson 
Signed-off-by: Ram Pai 
---
 arch/powerpc/sysdev/xive/spapr.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index 55dc61c..608b52f 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "xive-internal.h"
 
@@ -501,6 +503,9 @@ static int xive_spapr_configure_queue(u32 target, struct 
xive_q *q, u8 prio,
rc = -EIO;
} else {
q->qpage = qpage;
+   if (is_secure_guest())
+   uv_share_page(PHYS_PFN(qpage_phys),
+   1 << xive_alloc_order(order));
}
 fail:
return rc;
@@ -534,6 +539,8 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, 
struct xive_cpu *xc,
   hw_cpu, prio);
 
alloc_order = xive_alloc_order(xive_queue_shift);
+   if (is_secure_guest())
+   uv_unshare_page(PHYS_PFN(__pa(q->qpage)), 1 << alloc_order);
free_pages((unsigned long)q->qpage, alloc_order);
q->qpage = NULL;
 }
-- 
1.8.3.1

RE: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

2020-03-05 Thread Ram Pai

On Thu, Mar 05, 2020 at 10:55:45AM +1100, David Gibson wrote:
> On Wed, Mar 04, 2020 at 04:56:09PM +0100, Cédric Le Goater wrote:
> > [ ... ]
> > 
> > > (1) applied the patch which shares the EQ-page with the hypervisor.
> > > (2) set "kernel_irqchip=off"
> > > (3) set "ic-mode=xive"
> > 
> > you don't have to set the interrupt mode. xive should be negotiated
> > by default.
> > 
> > > (4) set "svm=on" on the kernel command line.
> > > (5) no changes to the hypervisor and ultravisor.
> > > 
> > > And Boom it works!.   So you were right.
> > 
> > Excellent.
> >  
> > > I am sending out the patch for (1) above ASAP.
> > 
> > Next step, could you please try to do the same with the TIMA and ESB pfn ?
> > and use KVM.
> 
> I'm a bit confused by this.  Aren't the TIMA and ESB pages essentially
> IO pages, rather than memory pages from the guest's point of view?  I
> assume only memory pages are protected with PEF - I can't even really
> see what protecting an IO page would even mean.

It means, that the hypervisor and qemu cannot access the addresses used
to access the I/O pages. It can only be accessed by Ultravisor and the
SVM.

As it stands today, those pages are accessible from the hypervisor
and not from the SVM or the ultravisor.

To make it work, we need to enable acccess to those pages from the SVM
and from the ultravisor.  One thing I am not clear is should we block
access to those pages from the hypervisor.  If yes, than there is no
good way to do that, without hardware help.  If no, than those GPA pages
can be shared, so that hypervisor/ultravisor/qemu/SVM can all access
those pages.

RP

RE: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

2020-03-04 Thread Ram Pai

On Wed, Mar 04, 2020 at 11:59:48AM +0100, Greg Kurz wrote:
> On Tue, 3 Mar 2020 10:56:45 -0800
> Ram Pai  wrote:
> 
> > On Tue, Mar 03, 2020 at 06:45:20PM +0100, Greg Kurz wrote:
snip.
> > > 
> > > This patch would allow at least to answer Cedric's question about
> > > kernel_irqchip=off, since this looks like the only thing needed
> > > to make it work.
> > 
> > hmm.. I am not sure. Are you saying
> > (a) patch the guest kernel to share the event queue page
> > (b) run the qemu with "kernel_irqchip=off"
> > (c) and the guest kernel with "svm=on"
> > 
> > and it should all work?
> > 
> 
> Yes.

Ok. 

(1) applied the patch which shares the EQ-page with the hypervisor.
(2) set "kernel_irqchip=off"
(3) set "ic-mode=xive"
(4) set "svm=on" on the kernel command line.
(5) no changes to the hypervisor and ultravisor.

And Boom it works!.   So you were right.

I am sending out the patch for (1) above ASAP.

Thanks,
RP

RE: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

2020-03-04 Thread Ram Pai

On Wed, Mar 04, 2020 at 11:59:48AM +0100, Greg Kurz wrote:
> On Tue, 3 Mar 2020 10:56:45 -0800
> Ram Pai  wrote:
> 
> > On Tue, Mar 03, 2020 at 06:45:20PM +0100, Greg Kurz wrote:
> > > On Tue, 3 Mar 2020 09:02:05 -0800
> > > Ram Pai  wrote:
> > > 
> > > > On Tue, Mar 03, 2020 at 07:50:08AM +0100, Cédric Le Goater wrote:
> > > > > On 3/3/20 12:32 AM, David Gibson wrote:
> > > > > > On Fri, Feb 28, 2020 at 11:54:04PM -0800, Ram Pai wrote:
> > > > > >> XIVE is not correctly enabled for Secure VM in the KVM Hypervisor 
> > > > > >> yet.
> > > > > >>
> > > > > >> Hence Secure VM, must always default to XICS interrupt controller.
> > > > > >>
> > > > > >> If XIVE is requested through kernel command line option "xive=on",
> > > > > >> override and turn it off.
> > > > > >>
> > > > > >> If XIVE is the only supported platform interrupt controller; 
> > > > > >> specified
> > > > > >> through qemu option "ic-mode=xive", simply abort. Otherwise 
> > > > > >> default to
> > > > > >> XICS.
> > > > > > 
> > > > > > Uh... the discussion thread here seems to have gotten oddly off
> > > > > > track.  
> > > > > 
> > > > > There seem to be multiple issues. It is difficult to have a clear 
> > > > > status.
> > > > > 
> > > > > > So, to try to clean up some misunderstandings on both sides:
> > > > > > 
> > > > > >   1) The guest is the main thing that knows that it will be in 
> > > > > > secure
> > > > > >  mode, so it's reasonable for it to conditionally use XIVE based
> > > > > >  on that
> > > > > 
> > > > > FW support is required AFAIUI.
> > > > > >   2) The mechanism by which we do it here isn't quite right.  Here 
> > > > > > the
> > > > > >  guest is checking itself that the host only allows XIVE, but we
> > > > > >  can't do XIVE and is panic()ing.  Instead, in the SVM case we
> > > > > >  should force support->xive to false, and send that in the CAS
> > > > > >  request to the host.  We expect the host to just terminate
> > > > > >  us because of the mismatch, but this will interact better with
> > > > > >  host side options setting policy for panic states and the like.
> > > > > >  Essentially an SVM kernel should behave like an old kernel with
> > > > > >  no XIVE support at all, at least w.r.t. the CAS irq mode flags.
> > > > > 
> > > > > Yes. XIVE shouldn't be requested by the guest.
> > > > 
> > > > Ok.
> > > > 
> > > > > This is the last option 
> > > > > I proposed but I thought there was some negotiation with the 
> > > > > hypervisor
> > > > > which is not the case. 
> > > > > 
> > > > > >   3) Although there are means by which the hypervisor can kind of 
> > > > > > know
> > > > > >  a guest is in secure mode, there's not really an "svm=on" 
> > > > > > option
> > > > > >  on the host side.  For the most part secure mode is based on
> > > > > >  discussion directly between the guest and the ultravisor with
> > > > > >  almost no hypervisor intervention.
> > > > > 
> > > > > Is there a negotiation with the ultravisor ? 
> > > > 
> > > > The VM has no negotiation with the ultravisor w.r.t CAS.
> > > > 
> > > > > 
> > > > > >   4) I'm guessing the problem with XIVE in SVM mode is that XIVE 
> > > > > > needs
> > > > > >  to write to event queues in guest memory, which would have to 
> > > > > > be
> > > > > >  explicitly shared for secure mode.  That's true whether it's 
> > > > > > KVM
> > > > > >  or qemu accessing the guest memory, so kernel_irqchip=on/off is
> > > > > >  entirely irrelevant.
> > > > > 
> > > > > This problem should be already fixed.
> > > > > The XIVE event queue

RE: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

2020-03-03 Thread Ram Pai

On Tue, Mar 03, 2020 at 08:08:51PM +0100, Cédric Le Goater wrote:
> >>>   4) I'm guessing the problem with XIVE in SVM mode is that XIVE needs
> >>>  to write to event queues in guest memory, which would have to be
> >>>  explicitly shared for secure mode.  That's true whether it's KVM
> >>>  or qemu accessing the guest memory, so kernel_irqchip=on/off is
> >>>  entirely irrelevant.
> >>
> >> This problem should be already fixed.
> >> The XIVE event queues are shared 
> > 
> > Yes i have a patch for the guest kernel that shares the event 
> > queue page with the hypervisor. This is done using the
> > UV_SHARE_PAGE ultracall. This patch is not sent out to any any mailing
> > lists yet. However the patch by itself does not solve the xive problem
> > for secure VM.
> 
> yes because you also need to share the XIVE TIMA and ESB pages mapped 
> in xive_native_esb_fault() and xive_native_tima_fault(). 

These pages belong to the xive memory slot right? If that is the case,
they are implicitly shared. The Ultravisor will set them up to be
shared. The guest kernel should not be doing anything.

We still need some fixes in KVM and Ultravisor to correctly map the
hardware pages to GPA ranges of the xive memory slot. Work is in progress...



> 
> >> and the remaining problem with XIVE is the KVM page fault handler 
> >> populating the TIMA and ESB pages. Ultravisor doesn't seem to support
> >> this feature and this breaks interrupt management in the guest. 
> > 
> > Yes. This is the bigger issue that needs to be fixed. When the secure guest
> > accesses the page associated with the xive memslot, a page fault is
> > generated, which the ultravisor reflects to the hypervisor. Hypervisor
> > seems to be mapping Hardware-page to that GPA. Unforatunately it is not
> > informing the ultravisor of that map.  I am trying to understand the
> > root cause. But since I am not sure what more issues I might run into
> > after chasing down that issue, I figured its better to disable xive
> > support in SVM in the interim.
> 
> Is it possible to call uv_share_page() from the hypervisor ? 

No. Not allowed. If allowed hypervisor can easily attack the SVM.

> 
> >  BTW: I figured, I dont need this intermin patch to disable xive for
> > secure VM.  Just doing "svm=on xive=off" on the kernel command line is
> > sufficient for now. *
> 
> Yes. 
> 
> >> But, kernel_irqchip=off should work out of the box. It seems it doesn't. 
> >> Something to investigate.
> > 
> > Dont know why. 
> 
> We need to understand why. 
> 
> You still need the patch to share the event queue page allocated by the 
> guest OS because QEMU will enqueue events. But you should not need anything
> else.

ok. that is assuring.

> 
> > Does this option, disable the chip from interrupting the
> > guest directly; instead mediates the interrupt through the hypervisor?
> 
> Yes. The KVM backend is unused, the XIVE interrupt controller is deactivated
> for the guest and QEMU notifies the vCPUs directly.  
> 
> The TIMA and ESB pages belong the QEMU process and the guest OS will do 
> some load and store operations onto them for interrupt management. Is that 
> OK from a UV perspective ?  

Yes. These GPA ranges are needed; by design, to be read/writable from qemu/KVM 
and
the SVM. Just that the implementation in its current form, needs some
fixing.

RP

RE: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

2020-03-03 Thread Ram Pai

On Tue, Mar 03, 2020 at 06:45:20PM +0100, Greg Kurz wrote:
> On Tue, 3 Mar 2020 09:02:05 -0800
> Ram Pai  wrote:
> 
> > On Tue, Mar 03, 2020 at 07:50:08AM +0100, Cédric Le Goater wrote:
> > > On 3/3/20 12:32 AM, David Gibson wrote:
> > > > On Fri, Feb 28, 2020 at 11:54:04PM -0800, Ram Pai wrote:
> > > >> XIVE is not correctly enabled for Secure VM in the KVM Hypervisor yet.
> > > >>
> > > >> Hence Secure VM, must always default to XICS interrupt controller.
> > > >>
> > > >> If XIVE is requested through kernel command line option "xive=on",
> > > >> override and turn it off.
> > > >>
> > > >> If XIVE is the only supported platform interrupt controller; specified
> > > >> through qemu option "ic-mode=xive", simply abort. Otherwise default to
> > > >> XICS.
> > > > 
> > > > Uh... the discussion thread here seems to have gotten oddly off
> > > > track.  
> > > 
> > > There seem to be multiple issues. It is difficult to have a clear status.
> > > 
> > > > So, to try to clean up some misunderstandings on both sides:
> > > > 
> > > >   1) The guest is the main thing that knows that it will be in secure
> > > >  mode, so it's reasonable for it to conditionally use XIVE based
> > > >  on that
> > > 
> > > FW support is required AFAIUI.
> > > >   2) The mechanism by which we do it here isn't quite right.  Here the
> > > >  guest is checking itself that the host only allows XIVE, but we
> > > >  can't do XIVE and is panic()ing.  Instead, in the SVM case we
> > > >  should force support->xive to false, and send that in the CAS
> > > >  request to the host.  We expect the host to just terminate
> > > >  us because of the mismatch, but this will interact better with
> > > >  host side options setting policy for panic states and the like.
> > > >  Essentially an SVM kernel should behave like an old kernel with
> > > >  no XIVE support at all, at least w.r.t. the CAS irq mode flags.
> > > 
> > > Yes. XIVE shouldn't be requested by the guest.
> > 
> > Ok.
> > 
> > > This is the last option 
> > > I proposed but I thought there was some negotiation with the hypervisor
> > > which is not the case. 
> > > 
> > > >   3) Although there are means by which the hypervisor can kind of know
> > > >  a guest is in secure mode, there's not really an "svm=on" option
> > > >  on the host side.  For the most part secure mode is based on
> > > >  discussion directly between the guest and the ultravisor with
> > > >  almost no hypervisor intervention.
> > > 
> > > Is there a negotiation with the ultravisor ? 
> > 
> > The VM has no negotiation with the ultravisor w.r.t CAS.
> > 
> > > 
> > > >   4) I'm guessing the problem with XIVE in SVM mode is that XIVE needs
> > > >  to write to event queues in guest memory, which would have to be
> > > >  explicitly shared for secure mode.  That's true whether it's KVM
> > > >  or qemu accessing the guest memory, so kernel_irqchip=on/off is
> > > >  entirely irrelevant.
> > > 
> > > This problem should be already fixed.
> > > The XIVE event queues are shared 
> > 
> > Yes i have a patch for the guest kernel that shares the event 
> > queue page with the hypervisor. This is done using the
> > UV_SHARE_PAGE ultracall. This patch is not sent out to any any mailing
> > lists yet.
> 
> Why ?

At this point I am not sure if this is the only change, I need to the
guest kernel.  I also need changes to KVM and to the ultravisor. Its bit
premature to send the patch without having figured out everything
to get xive working on a Secure VM.

> 
> > However the patch by itself does not solve the xive problem
> > for secure VM.
> > 
> 
> This patch would allow at least to answer Cedric's question about
> kernel_irqchip=off, since this looks like the only thing needed
> to make it work.

hmm.. I am not sure. Are you saying
(a) patch the guest kernel to share the event queue page
(b) run the qemu with "kernel_irqchip=off"
(c) and the guest kernel with "svm=on"

and it should all work?

RP

RE: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

2020-03-03 Thread Ram Pai

On Tue, Mar 03, 2020 at 07:50:08AM +0100, Cédric Le Goater wrote:
> On 3/3/20 12:32 AM, David Gibson wrote:
> > On Fri, Feb 28, 2020 at 11:54:04PM -0800, Ram Pai wrote:
> >> XIVE is not correctly enabled for Secure VM in the KVM Hypervisor yet.
> >>
> >> Hence Secure VM, must always default to XICS interrupt controller.
> >>
> >> If XIVE is requested through kernel command line option "xive=on",
> >> override and turn it off.
> >>
> >> If XIVE is the only supported platform interrupt controller; specified
> >> through qemu option "ic-mode=xive", simply abort. Otherwise default to
> >> XICS.
> > 
> > Uh... the discussion thread here seems to have gotten oddly off
> > track.  
> 
> There seem to be multiple issues. It is difficult to have a clear status.
> 
> > So, to try to clean up some misunderstandings on both sides:
> > 
> >   1) The guest is the main thing that knows that it will be in secure
> >  mode, so it's reasonable for it to conditionally use XIVE based
> >  on that
> 
> FW support is required AFAIUI.
> >   2) The mechanism by which we do it here isn't quite right.  Here the
> >  guest is checking itself that the host only allows XIVE, but we
> >  can't do XIVE and is panic()ing.  Instead, in the SVM case we
> >  should force support->xive to false, and send that in the CAS
> >  request to the host.  We expect the host to just terminate
> >  us because of the mismatch, but this will interact better with
> >  host side options setting policy for panic states and the like.
> >  Essentially an SVM kernel should behave like an old kernel with
> >  no XIVE support at all, at least w.r.t. the CAS irq mode flags.
> 
> Yes. XIVE shouldn't be requested by the guest.

Ok.

> This is the last option 
> I proposed but I thought there was some negotiation with the hypervisor
> which is not the case. 
> 
> >   3) Although there are means by which the hypervisor can kind of know
> >  a guest is in secure mode, there's not really an "svm=on" option
> >  on the host side.  For the most part secure mode is based on
> >  discussion directly between the guest and the ultravisor with
> >  almost no hypervisor intervention.
> 
> Is there a negotiation with the ultravisor ? 

The VM has no negotiation with the ultravisor w.r.t CAS.

> 
> >   4) I'm guessing the problem with XIVE in SVM mode is that XIVE needs
> >  to write to event queues in guest memory, which would have to be
> >  explicitly shared for secure mode.  That's true whether it's KVM
> >  or qemu accessing the guest memory, so kernel_irqchip=on/off is
> >  entirely irrelevant.
> 
> This problem should be already fixed.
> The XIVE event queues are shared 

Yes i have a patch for the guest kernel that shares the event 
queue page with the hypervisor. This is done using the
UV_SHARE_PAGE ultracall. This patch is not sent out to any any mailing
lists yet. However the patch by itself does not solve the xive problem
for secure VM.

> and the remaining problem with XIVE is the KVM page fault handler 
> populating the TIMA and ESB pages. Ultravisor doesn't seem to support
> this feature and this breaks interrupt management in the guest. 

Yes. This is the bigger issue that needs to be fixed. When the secure guest
accesses the page associated with the xive memslot, a page fault is
generated, which the ultravisor reflects to the hypervisor. Hypervisor
seems to be mapping Hardware-page to that GPA. Unforatunately it is not
informing the ultravisor of that map.  I am trying to understand the
root cause. But since I am not sure what more issues I might run into
after chasing down that issue, I figured its better to disable xive
support in SVM in the interim.

 BTW: I figured, I dont need this intermin patch to disable xive for
secure VM.  Just doing "svm=on xive=off" on the kernel command line is
sufficient for now. *


> 
> But, kernel_irqchip=off should work out of the box. It seems it doesn't. 
> Something to investigate.

Dont know why. 

Does this option, disable the chip from interrupting the
guest directly; instead mediates the interrupt through the hypervisor?

> 
> > 
> >   5) All the above said, having to use XICS is pretty crappy.  You
> >  should really get working on XIVE support for secure VMs.
> 
> Yes. 

and yes too.


Summary:  I am dropping this patch for now.

> 
> Thanks,
> 
> C.

-- 
Ram Pai

RE: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

2020-02-29 Thread Ram Pai

On Sat, Feb 29, 2020 at 09:27:54AM +0100, Cédric Le Goater wrote:
> On 2/29/20 8:54 AM, Ram Pai wrote:
> > XIVE is not correctly enabled for Secure VM in the KVM Hypervisor yet.
> > 
> > Hence Secure VM, must always default to XICS interrupt controller.
> 
> have you tried XIVE emulation 'kernel-irqchip=off' ? 

yes and it hangs. I think that option, continues to enable some variant
of XIVE in the VM.  There are some known deficiencies between KVM
and the ultravisor negotiation, resulting in a hang in the SVM.

> 
> > If XIVE is requested through kernel command line option "xive=on",
> > override and turn it off.
> 
> This is incorrect. It is negotiated through CAS depending on the FW
> capabilities and the KVM capabilities.

Yes I understand, qemu/KVM have predetermined a set of capabilties that
it can offer to the VM.  The kernel within the VM has a list of
capabilties it needs to operate correctly.  So both negotiate and
determine something mutually ammicable.

Here I am talking about the list of capabilities that the kernel is
trying to determine, it needs to operate correctly.  "xive=on" is one of
those capabilities the kernel is told by the VM-adminstrator, to enable.
Unfortunately if the VM-administrtor blindly requests to enable it, the
kernel must override it, if it knows that will be switching the VM into
a SVM soon. No point negotiating a capability with Qemu; through CAS,
if it knows it cannot handle that capability.

> 
> > If XIVE is the only supported platform interrupt controller; specified
> > through qemu option "ic-mode=xive", simply abort. Otherwise default to
> > XICS.
> 
> 
> I don't think it is a good approach to downgrade the guest kernel 
> capabilities this way. 
> 
> PAPR has specified the CAS negotiation process for this purpose. It 
> comes in two parts under KVM. First the KVM hypervisor advertises or 
> not a capability to QEMU. The second is the CAS negotiation process 
> between QEMU and the guest OS.

Unfortunately, this is not viable.  At the time the hypervisor
advertises its capabilities to qemu, the hypervisor has no idea whether
that VM will switch into a SVM or not.  The decision to switch into a
SVM is taken by the kernel running in the VM. This happens much later,
after the hypervisor has already conveyed its capabilties to the qemu, and
qemu has than instantiated the VM.

As a result, CAS in prom_init is the only place where this negotiation
can take place.

> 
> The SVM specifications might not be complete yet and if some features 
> are incompatible, I think we should modify the capabilities advertised 
> by the hypervisor : no XIVE in case of SVM. QEMU will automatically 
> use the fallback path and emulate the XIVE device, same as setting 
> 'kernel-irqchip=off'. 

As mentioned above, this would be an excellent approach, if the
Hypervisor was aware of the VM's intent to switch into a SVM.  Neither
the hypervisor knows, nor the qemu.  Only the kernel running within the
VM knows about it.

Do you still think, my approach is wrong?
RP

[RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

2020-02-28 Thread Ram Pai

XIVE is not correctly enabled for Secure VM in the KVM Hypervisor yet.

Hence Secure VM, must always default to XICS interrupt controller.

If XIVE is requested through kernel command line option "xive=on",
override and turn it off.

If XIVE is the only supported platform interrupt controller; specified
through qemu option "ic-mode=xive", simply abort. Otherwise default to
XICS.

Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Thiago Jung Bauermann 
Cc: Michael Anderson 
Cc: Sukadev Bhattiprolu 
Cc: Alexey Kardashevskiy 
Cc: Paul Mackerras 
Cc: Greg Kurz 
Cc: Cedric Le Goater 
Cc: David Gibson 
Signed-off-by: Ram Pai 
---
 arch/powerpc/kernel/prom_init.c | 43 -
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 5773453..dd96c82 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -805,6 +805,18 @@ static void __init early_cmdline_parse(void)
 #endif
}
 
+#ifdef CONFIG_PPC_SVM
+   opt = prom_strstr(prom_cmd_line, "svm=");
+   if (opt) {
+   bool val;
+
+   opt += sizeof("svm=") - 1;
+   if (!prom_strtobool(opt, ))
+   prom_svm_enable = val;
+   prom_printf("svm =%d\n", prom_svm_enable);
+   }
+#endif /* CONFIG_PPC_SVM */
+
 #ifdef CONFIG_PPC_PSERIES
prom_radix_disable = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT);
opt = prom_strstr(prom_cmd_line, "disable_radix");
@@ -823,23 +835,22 @@ static void __init early_cmdline_parse(void)
if (prom_radix_disable)
prom_debug("Radix disabled from cmdline\n");
 
-   opt = prom_strstr(prom_cmd_line, "xive=off");
-   if (opt) {
+#ifdef CONFIG_PPC_SVM
+   if (prom_svm_enable) {
prom_xive_disable = true;
-   prom_debug("XIVE disabled from cmdline\n");
+   prom_debug("XIVE disabled in Secure VM\n");
}
-#endif /* CONFIG_PPC_PSERIES */
-
-#ifdef CONFIG_PPC_SVM
-   opt = prom_strstr(prom_cmd_line, "svm=");
-   if (opt) {
-   bool val;
+#endif /* CONFIG_PPC_SVM */
 
-   opt += sizeof("svm=") - 1;
-   if (!prom_strtobool(opt, ))
-   prom_svm_enable = val;
+   if (!prom_xive_disable) {
+   opt = prom_strstr(prom_cmd_line, "xive=off");
+   if (opt) {
+   prom_xive_disable = true;
+   prom_debug("XIVE disabled from cmdline\n");
+   }
}
-#endif /* CONFIG_PPC_SVM */
+
+#endif /* CONFIG_PPC_PSERIES */
 }
 
 #ifdef CONFIG_PPC_PSERIES
@@ -1251,6 +1262,12 @@ static void __init prom_parse_xive_model(u8 val,
break;
case OV5_FEAT(OV5_XIVE_EXPLOIT): /* Only Exploitation mode */
prom_debug("XIVE - exploitation mode supported\n");
+
+#ifdef CONFIG_PPC_SVM
+   if (prom_svm_enable)
+   prom_panic("WARNING: xive unsupported in Secure VM\n");
+#endif /* CONFIG_PPC_SVM */
+
if (prom_xive_disable) {
/*
 * If we __have__ to do XIVE, we're better off ignoring
-- 
1.8.3.1

Re: [RESEND PATCH v2 3/3] powerpc/powernv: Parse device tree, population of SPR support

2020-01-12 Thread Ram Pai

On Mon, Jan 13, 2020 at 09:15:09AM +0530, Pratik Rajesh Sampat wrote:
> Parse the device tree for nodes self-save, self-restore and populate
> support for the preferred SPRs based what was advertised by the device
> tree.
> 
> Signed-off-by: Pratik Rajesh Sampat 
> ---
>  arch/powerpc/platforms/powernv/idle.c | 104 ++
>  1 file changed, 104 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index d67d4d0b169b..e910ff40b7e6 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -1429,6 +1429,107 @@ static void __init pnv_probe_idle_states(void)
>   supported_cpuidle_states |= pnv_idle_states[i].flags;
>  }
> 
> +/*
> + * Extracts and populates the self save or restore capabilities
> + * passed from the device tree node
> + */
> +static int extract_save_restore_state_dt(struct device_node *np, int type)
> +{
> + int nr_sprns = 0, i, bitmask_index;
> + int rc = 0;
> + u64 *temp_u64;
> + const char *state_prop;
> + u64 bit_pos;
> +
> + state_prop = of_get_property(np, "status", NULL);
> + if (!state_prop) {
> + pr_warn("opal: failed to find the active value for self 
> save/restore node");
> + return -EINVAL;
> + }
> + if (strncmp(state_prop, "disabled", 8) == 0) {
> + /*
> +  * if the feature is not active, strip the preferred_sprs from
> +  * that capability.
> +  */
> + if (type == SELF_RESTORE_TYPE) {
> + for (i = 0; i < nr_preferred_sprs; i++) {
> + preferred_sprs[i].supported_mode &=
> + ~SELF_RESTORE_STRICT;
> + }
> + } else {
> + for (i = 0; i < nr_preferred_sprs; i++) {
> + preferred_sprs[i].supported_mode &=
> + ~SELF_SAVE_STRICT;
> + }
> + }
> + return 0;
> + }
> + nr_sprns = of_property_count_u64_elems(np, "sprn-bitmask");
> + if (nr_sprns <= 0)
> + return rc;
> + temp_u64 = kcalloc(nr_sprns, sizeof(u64), GFP_KERNEL);
> + if (of_property_read_u64_array(np, "sprn-bitmask",
> +temp_u64, nr_sprns)) {
> + pr_warn("cpuidle-powernv: failed to find registers in DT\n");
> + kfree(temp_u64);
> + return -EINVAL;
> + }
> + /*
> +  * Populate acknowledgment of support for the sprs in the global vector
> +  * gotten by the registers supplied by the firmware.
> +  * The registers are in a bitmask, bit index within
> +  * that specifies the SPR
> +  */
> + for (i = 0; i < nr_preferred_sprs; i++) {
> + bitmask_index = preferred_sprs[i].spr / 64;
> + bit_pos = preferred_sprs[i].spr % 64;
> + if ((temp_u64[bitmask_index] & (1UL << bit_pos)) == 0) {
> + if (type == SELF_RESTORE_TYPE)
> + preferred_sprs[i].supported_mode &=
> + ~SELF_RESTORE_STRICT;
> + else
> + preferred_sprs[i].supported_mode &=
> + ~SELF_SAVE_STRICT;
> + continue;
> + }
> + if (type == SELF_RESTORE_TYPE) {
> + preferred_sprs[i].supported_mode |=
> + SELF_RESTORE_STRICT;
> + } else {
> + preferred_sprs[i].supported_mode |=
> + SELF_SAVE_STRICT;
> + }
> + }
> +
> + kfree(temp_u64);
> + return rc;
> +}
> +
> +static int pnv_parse_deepstate_dt(void)
> +{
> + struct device_node *np, *np1;
> + int rc = 0;
> +
> + /* Self restore register population */
> + np = of_find_node_by_path("/ibm,opal/power-mgt/self-restore");
> + if (!np) {
> +         pr_warn("opal: self restore Node not found");
> + } else {
> + rc = extract_save_restore_state_dt(np, SELF_RESTORE_TYPE);
> + if (rc != 0)
> + return rc;
> + }
> + /* Self save register population */
> + np1 = of_find_node_by_path("/ibm,opal/power-mgt/self-save");

'np' can be reused?  'np1' is not needed.


Otherwise looks good.

Reviewed-by: Ram Pai 

RP

Re: [RESEND PATCH v2 2/3] powerpc/powernv: Introduce Self save support

2020-01-12 Thread Ram Pai

On Mon, Jan 13, 2020 at 09:15:08AM +0530, Pratik Rajesh Sampat wrote:
> This commit introduces and leverages the Self save API which OPAL now
> supports.
> 
> Add the new Self Save OPAL API call in the list of OPAL calls.
> Implement the self saving of the SPRs based on the support populated
> while respecting it's preferences.
> 
> This implementation allows mixing of support for the SPRs, which
> means that a SPR can be self restored while another SPR be self saved if
> they support and prefer it to be so.
> 
> Signed-off-by: Pratik Rajesh Sampat 
> ---
>  arch/powerpc/include/asm/opal-api.h| 3 ++-
>  arch/powerpc/include/asm/opal.h| 1 +
>  arch/powerpc/platforms/powernv/idle.c  | 2 ++
>  arch/powerpc/platforms/powernv/opal-call.c | 1 +
>  4 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index c1f25a760eb1..1b6e1a68d431 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -214,7 +214,8 @@
>  #define OPAL_SECVAR_GET  176
>  #define OPAL_SECVAR_GET_NEXT 177
>  #define OPAL_SECVAR_ENQUEUE_UPDATE   178
> -#define OPAL_LAST178
> +#define OPAL_SLW_SELF_SAVE_REG   181
> +#define OPAL_LAST181
> 
>  #define QUIESCE_HOLD 1 /* Spin all calls at entry */
>  #define QUIESCE_REJECT   2 /* Fail all calls with 
> OPAL_BUSY */
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 9986ac34b8e2..389a85b63805 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -203,6 +203,7 @@ int64_t opal_handle_hmi(void);
>  int64_t opal_handle_hmi2(__be64 *out_flags);
>  int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
>  int64_t opal_unregister_dump_region(uint32_t id);
> +int64_t opal_slw_self_save_reg(uint64_t cpu_pir, uint64_t sprn);
>  int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
>  int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
>  int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
> pe_number);
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 2f328403b0dc..d67d4d0b169b 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -1172,6 +1172,8 @@ void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 
> lpcr_val)
>   if (!is_lpcr_self_save)
>   opal_slw_set_reg(pir, SPRN_LPCR,
>lpcr_val);
> + else
> +             opal_slw_self_save_reg(pir, SPRN_LPCR);

opal_slw_self_save_reg() was used in the prior patch too. How did it
compile, if the definition is in this patch?


Reviewed-by: Ram Pai 

RP

Re: [RESEND PATCH v2 1/3] powerpc/powernv: Interface to define support and preference for a SPR

2020-01-12 Thread Ram Pai

On Mon, Jan 13, 2020 at 09:15:07AM +0530, Pratik Rajesh Sampat wrote:
> Define a bitmask interface to determine support for the Self Restore,
> Self Save or both.
> 
> Also define an interface to determine the preference of that SPR to
> be strictly saved or restored or encapsulated with an order of preference.
> 
> The preference bitmask is shown as below:
> 
> |... | 2nd pref | 1st pref |
> 
> MSB LSB
> 
> The preference from higher to lower is from LSB to MSB with a shift of 8
> bits.
> Example:
> Prefer self save first, if not available then prefer self
> restore
> The preference mask for this scenario will be seen as below.
> ((SELF_RESTORE_STRICT << PREFERENCE_SHIFT) | SELF_SAVE_STRICT)
> -
> |... | Self restore | Self save |
> -
> MSB   LSB
> 
> Finally, declare a list of preferred SPRs which encapsulate the bitmaks
> for preferred and supported with defaults of both being set to support
> legacy firmware.
> 
> This commit also implements using the above interface and retains the
> legacy functionality of self restore.
> 
> Signed-off-by: Pratik Rajesh Sampat 
> ---
>  arch/powerpc/platforms/powernv/idle.c | 327 +-
>  1 file changed, 271 insertions(+), 56 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 78599bca66c2..2f328403b0dc 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -32,9 +32,106 @@
>  #define P9_STOP_SPR_MSR 2000
>  #define P9_STOP_SPR_PSSCR  855
> 
> +/* Interface for the stop state supported and preference */
> +#define SELF_RESTORE_TYPE0
> +#define SELF_SAVE_TYPE   1
> +
> +#define NR_PREFERENCES2
> +#define PREFERENCE_SHIFT  4
> +#define PREFERENCE_MASK   0xf
> +
> +#define UNSUPPORTED 0x0
> +#define SELF_RESTORE_STRICT 0x1
> +#define SELF_SAVE_STRICT0x2
> +
> +/*
> + * Bitmask defining the kind of preferences available.
> + * Note : The higher to lower preference is from LSB to MSB, with a shift of
> + * 4 bits.
> + * 
> + * || 2nd pref | 1st pref |
> + * 
> + * MSB LSB
> + */
> +/* Prefer Restore if available, otherwise unsupported */
> +#define PREFER_SELF_RESTORE_ONLY SELF_RESTORE_STRICT
> +/* Prefer Save if available, otherwise unsupported */
> +#define PREFER_SELF_SAVE_ONLYSELF_SAVE_STRICT
> +/* Prefer Restore when available, otherwise prefer Save */
> +#define PREFER_RESTORE_SAVE  ((SELF_SAVE_STRICT << \
> +   PREFERENCE_SHIFT)\
> +   | SELF_RESTORE_STRICT)
> +/* Prefer Save when available, otherwise prefer Restore*/
> +#define PREFER_SAVE_RESTORE  ((SELF_RESTORE_STRICT <<\
> +   PREFERENCE_SHIFT)\
> +   | SELF_SAVE_STRICT)
>  static u32 supported_cpuidle_states;
>  struct pnv_idle_states_t *pnv_idle_states;
>  int nr_pnv_idle_states;
> +/* Caching the lpcr & ptcr support to use later */
> +static bool is_lpcr_self_save;
> +static bool is_ptcr_self_save;
> +
> +struct preferred_sprs {
> + u64 spr;
> + u32 preferred_mode;
> + u32 supported_mode;
> +};
> +
> +struct preferred_sprs preferred_sprs[] = {
> + {
> + .spr = SPRN_HSPRG0,
> + .preferred_mode = PREFER_RESTORE_SAVE,
> + .supported_mode = SELF_RESTORE_STRICT,
> + },
> + {
> + .spr = SPRN_LPCR,
> + .preferred_mode = PREFER_RESTORE_SAVE,
> + .supported_mode = SELF_RESTORE_STRICT,
> + },
> + {
> + .spr = SPRN_PTCR,
> + .preferred_mode = PREFER_SAVE_RESTORE,
> + .supported_mode = SELF_RESTORE_STRICT,
> + },

This confuses me.  It says SAVE takes precedence over RESTORE.
and than it says it is strictly 'RESTORE' only.

Maybe you should not initialize the 'supported_mode' ?
or put a comment somewhere here, saying this value will be overwritten
during system initialization?


Otherwise the code looks correct.

Reviewed-by: Ram Pai 
RP

Re: [PATCH v4 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2020-01-07 Thread Ram Pai

On Mon, Jan 06, 2020 at 06:02:37PM -0800, Sukadev Bhattiprolu wrote:
> Ram Pai [linux...@us.ibm.com] wrote:
> >
> > One small comment.. H_STATE is a better return code than H_UNSUPPORTED.
> > 
> 
> Here is the updated patch - we now return H_STATE if the abort call is
> made after the VM has gone secure.
> ---
> >From 73fe1fa5aff2829f2fae6a339169e56dc0bbae06 Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu 
> Date: Fri, 27 Sep 2019 14:30:36 -0500
> Subject: [PATCH 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

This patch looks good.

Reviewed-by: Ram Pai 

Thanks,
RP

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 938 matches

Mail list logo