Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-12-04 Thread Oliver Upton
Hi Mingwei,

On Mon, Dec 05, 2022 at 05:51:13AM +, Mingwei Zhang wrote:
> On Mon, Nov 14, 2022, Oliver Upton wrote:

[...]

> > As hyp stage-1 is protected by a spinlock there is no actual need for
> > RCU in that case. I'll post something later on today that addresses the
> > issue.
> > 
> 
> For each stage-2 page table walk, KVM will use
> kvm_mmu_topup_memory_cache() before taking the mmu lock. This ensures
> whoever holding the mmu lock won't sleep. hyp walkers seems to
> miss  this notion completely, whic makes me puzzeled. Using a spinlock
> only ensures functionality but seems quite inefficient if the one who
> holds the spinlock try to allocate pages and sleep...

You're probably confused by my mischaracterization in the above
paragraph. Hyp stage-1 walkers (outside of pKVM) are guarded with a
mutex and are perfectly able to sleep. The erroneous application of RCU
led to this path becoming non-sleepable, hence the bug.

pKVM's own hyp stage-1 walkers are guarded by a spinlock, but the memory
allocations come from its own allocator and there is no concept of a
scheduler at EL2.

> Why do we need an RCU lock here. Oh is it for batching?

We definitely don't need RCU here, thus the corrective measure was to
avoid RCU for exclusive table walks.

https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git/commit/?h=next=b7833bf202e3068abb77c642a0843f696e9c8d38

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-12-04 Thread Mingwei Zhang
On Mon, Nov 14, 2022, Oliver Upton wrote:
> Hi Marek,
> 
> On Mon, Nov 14, 2022 at 03:29:14PM +0100, Marek Szyprowski wrote:
> > This patch landed in today's linux-next (20221114) as commit 
> > c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
> > Unfortunately it introduces a following warning:
> 
> Thanks for the bug report :) I had failed to test nVHE in the past few
> revisions of this series.
> 
> > --->8---
> > 
> > kvm [1]: IPA Size Limit: 40 bits
> > BUG: sleeping function called from invalid context at 
> > include/linux/sched/mm.h:274
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > 2 locks held by swapper/0/1:
> >   #0: 8a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
> > __create_hyp_mappings+0x80/0xc4
> >   #1: 8a927720 (rcu_read_lock){}-{1:2}, at: 
> > kvm_pgtable_walk+0x0/0x1f4
> > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
> > Hardware name: Raspberry Pi 3 Model B (DT)
> > Call trace:
> >   dump_backtrace.part.0+0xe4/0xf0
> >   show_stack+0x18/0x40
> >   dump_stack_lvl+0x8c/0xb8
> >   dump_stack+0x18/0x34
> >   __might_resched+0x178/0x220
> >   __might_sleep+0x48/0xa0
> >   prepare_alloc_pages+0x178/0x1a0
> >   __alloc_pages+0x9c/0x109c
> >   alloc_page_interleave+0x1c/0xc4
> >   alloc_pages+0xec/0x160
> >   get_zeroed_page+0x1c/0x44
> >   kvm_hyp_zalloc_page+0x14/0x20
> >   hyp_map_walker+0xd4/0x134
> >   kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
> >   __kvm_pgtable_walk+0x1a4/0x220
> >   kvm_pgtable_walk+0x104/0x1f4
> >   kvm_pgtable_hyp_map+0x80/0xc4
> >   __create_hyp_mappings+0x9c/0xc4
> >   kvm_mmu_init+0x144/0x1cc
> >   kvm_arch_init+0xe4/0xef4
> >   kvm_init+0x3c/0x3d0
> >   arm_init+0x20/0x30
> >   do_one_initcall+0x74/0x400
> >   kernel_init_freeable+0x2e0/0x350
> >   kernel_init+0x24/0x130
> >   ret_from_fork+0x10/0x20
> > kvm [1]: Hyp mode initialized successfully
> > 
> > --->8
> > 
> > I looks that more changes in the KVM code are needed to use RCU for that 
> > code.
> 
> Right, the specific issue is that while the stage-2 walkers preallocate
> any table memory they may need, the hyp walkers do not and allocate
> inline.
> 
> As hyp stage-1 is protected by a spinlock there is no actual need for
> RCU in that case. I'll post something later on today that addresses the
> issue.
> 

For each stage-2 page table walk, KVM will use
kvm_mmu_topup_memory_cache() before taking the mmu lock. This ensures
whoever holding the mmu lock won't sleep. hyp walkers seems to
miss  this notion completely, whic makes me puzzeled. Using a spinlock
only ensures functionality but seems quite inefficient if the one who
holds the spinlock try to allocate pages and sleep...

But that seems to be a separate problem for nvhe. Why do we need an RCU
lock here. Oh is it for batching?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-15 Thread Oliver Upton
On Tue, Nov 15, 2022 at 10:47:37AM -0800, Ricardo Koller wrote:
> On Wed, Nov 09, 2022 at 11:55:31PM +, Oliver Upton wrote:
> > On Wed, Nov 09, 2022 at 09:53:45PM +, Sean Christopherson wrote:
> > > On Mon, Nov 07, 2022, Oliver Upton wrote:
> > > > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > > > release the RCU read lock when traversing the page tables. Defer the
> > > > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > > > and provide stubs for hypervisor code, as RCU is not available in such a
> > > > context.
> > > > 
> > > > The RCU protection doesn't amount to much at the moment, as readers are
> > > > already protected by the read-write lock (all walkers that free table
> > > > memory take the write lock). Nonetheless, a subsequent change will
> > > > futher relax the locking requirements around the stage-2 MMU, thereby
> > > > depending on RCU.
> > > 
> > > Two somewhat off-topic questions (because I'm curious):
> > 
> > Worth asking!
> > 
> > >  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup 
> > > access
> > > faults (handle_access_fault()) and/or write-protection faults without 
> > > acquiring
> > > mmu_lock?
> > 
> > I don't have any plans personally.
> > 
> > OTOH, adding support for read-side access faults is trivial, I just
> > didn't give it much thought as most large-scale implementations have
> > FEAT_HAFDBS (hardware access flag management).
> 
> WDYT of permission relaxation (write-protection faults) on the fast
> path?
> 
> The benefits won't be as good as in x86 due to the required TLBI, but
> may be worth it due to not dealing with the mmu lock and avoiding some
> of the context save/restore.  Note that unlike x86, in ARM the TLB entry
> related to a protection fault needs to be flushed.

Right, the only guarantee we have on arm64 is that the TLB will never
hold an entry that would produce an access fault.

I have no issues whatsoever with implementing a lock-free walker, we're
already most of the way there with the RCU implementation modulo some
rules for atomic PTE updates. I don't believe lock acquisition is a
bounding issue for us quite yet as break-before-make + lazy splitting
hurts.

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-15 Thread Ricardo Koller
On Wed, Nov 09, 2022 at 11:55:31PM +, Oliver Upton wrote:
> On Wed, Nov 09, 2022 at 09:53:45PM +, Sean Christopherson wrote:
> > On Mon, Nov 07, 2022, Oliver Upton wrote:
> > > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > > release the RCU read lock when traversing the page tables. Defer the
> > > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > > and provide stubs for hypervisor code, as RCU is not available in such a
> > > context.
> > > 
> > > The RCU protection doesn't amount to much at the moment, as readers are
> > > already protected by the read-write lock (all walkers that free table
> > > memory take the write lock). Nonetheless, a subsequent change will
> > > futher relax the locking requirements around the stage-2 MMU, thereby
> > > depending on RCU.
> > 
> > Two somewhat off-topic questions (because I'm curious):
> 
> Worth asking!
> 
> >  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup 
> > access
> > faults (handle_access_fault()) and/or write-protection faults without 
> > acquiring
> > mmu_lock?
> 
> I don't have any plans personally.
> 
> OTOH, adding support for read-side access faults is trivial, I just
> didn't give it much thought as most large-scale implementations have
> FEAT_HAFDBS (hardware access flag management).

WDYT of permission relaxation (write-protection faults) on the fast
path?

The benefits won't be as good as in x86 due to the required TLBI, but
may be worth it due to not dealing with the mmu lock and avoiding some
of the context save/restore.  Note that unlike x86, in ARM the TLB entry
related to a protection fault needs to be flushed.

> 
> >  2. If the answer to (1) is "yes!", what's the plan to protect the lockless 
> > walks
> > for the RCU-less hypervisor code?
> 
> If/when we are worried about fault serialization in the lowvisor I was
> thinking something along the lines of disabling interrupts and using
> IPIs as barriers before freeing removed table memory, crudely giving the
> same protection as RCU.
> 
> --
> Thanks,
> Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-14 Thread Oliver Upton
Hi Marek,

On Mon, Nov 14, 2022 at 03:29:14PM +0100, Marek Szyprowski wrote:
> This patch landed in today's linux-next (20221114) as commit 
> c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
> Unfortunately it introduces a following warning:

Thanks for the bug report :) I had failed to test nVHE in the past few
revisions of this series.

> --->8---
> 
> kvm [1]: IPA Size Limit: 40 bits
> BUG: sleeping function called from invalid context at 
> include/linux/sched/mm.h:274
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 0, expected: 0
> RCU nest depth: 1, expected: 0
> 2 locks held by swapper/0/1:
>   #0: 8a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
> __create_hyp_mappings+0x80/0xc4
>   #1: 8a927720 (rcu_read_lock){}-{1:2}, at: 
> kvm_pgtable_walk+0x0/0x1f4
> CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
> Hardware name: Raspberry Pi 3 Model B (DT)
> Call trace:
>   dump_backtrace.part.0+0xe4/0xf0
>   show_stack+0x18/0x40
>   dump_stack_lvl+0x8c/0xb8
>   dump_stack+0x18/0x34
>   __might_resched+0x178/0x220
>   __might_sleep+0x48/0xa0
>   prepare_alloc_pages+0x178/0x1a0
>   __alloc_pages+0x9c/0x109c
>   alloc_page_interleave+0x1c/0xc4
>   alloc_pages+0xec/0x160
>   get_zeroed_page+0x1c/0x44
>   kvm_hyp_zalloc_page+0x14/0x20
>   hyp_map_walker+0xd4/0x134
>   kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
>   __kvm_pgtable_walk+0x1a4/0x220
>   kvm_pgtable_walk+0x104/0x1f4
>   kvm_pgtable_hyp_map+0x80/0xc4
>   __create_hyp_mappings+0x9c/0xc4
>   kvm_mmu_init+0x144/0x1cc
>   kvm_arch_init+0xe4/0xef4
>   kvm_init+0x3c/0x3d0
>   arm_init+0x20/0x30
>   do_one_initcall+0x74/0x400
>   kernel_init_freeable+0x2e0/0x350
>   kernel_init+0x24/0x130
>   ret_from_fork+0x10/0x20
> kvm [1]: Hyp mode initialized successfully
> 
> --->8
> 
> I looks that more changes in the KVM code are needed to use RCU for that 
> code.

Right, the specific issue is that while the stage-2 walkers preallocate
any table memory they may need, the hyp walkers do not and allocate
inline.

As hyp stage-1 is protected by a spinlock there is no actual need for
RCU in that case. I'll post something later on today that addresses the
issue.

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-14 Thread Marek Szyprowski
Hi Oliver,

On 07.11.2022 22:56, Oliver Upton wrote:
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
>
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.
>
> Signed-off-by: Oliver Upton 

This patch landed in today's linux-next (20221114) as commit 
c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
Unfortunately it introduces a following warning:

--->8---

kvm [1]: IPA Size Limit: 40 bits
BUG: sleeping function called from invalid context at 
include/linux/sched/mm.h:274
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by swapper/0/1:
  #0: 8a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
__create_hyp_mappings+0x80/0xc4
  #1: 8a927720 (rcu_read_lock){}-{1:2}, at: 
kvm_pgtable_walk+0x0/0x1f4
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
Hardware name: Raspberry Pi 3 Model B (DT)
Call trace:
  dump_backtrace.part.0+0xe4/0xf0
  show_stack+0x18/0x40
  dump_stack_lvl+0x8c/0xb8
  dump_stack+0x18/0x34
  __might_resched+0x178/0x220
  __might_sleep+0x48/0xa0
  prepare_alloc_pages+0x178/0x1a0
  __alloc_pages+0x9c/0x109c
  alloc_page_interleave+0x1c/0xc4
  alloc_pages+0xec/0x160
  get_zeroed_page+0x1c/0x44
  kvm_hyp_zalloc_page+0x14/0x20
  hyp_map_walker+0xd4/0x134
  kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
  __kvm_pgtable_walk+0x1a4/0x220
  kvm_pgtable_walk+0x104/0x1f4
  kvm_pgtable_hyp_map+0x80/0xc4
  __create_hyp_mappings+0x9c/0xc4
  kvm_mmu_init+0x144/0x1cc
  kvm_arch_init+0xe4/0xef4
  kvm_init+0x3c/0x3d0
  arm_init+0x20/0x30
  do_one_initcall+0x74/0x400
  kernel_init_freeable+0x2e0/0x350
  kernel_init+0x24/0x130
  ret_from_fork+0x10/0x20
kvm [1]: Hyp mode initialized successfully

--->8

I looks that more changes in the KVM code are needed to use RCU for that 
code.

> ---
>   arch/arm64/include/asm/kvm_pgtable.h | 49 
>   arch/arm64/kvm/hyp/pgtable.c | 10 +-
>   arch/arm64/kvm/mmu.c | 14 +++-
>   3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> b/arch/arm64/include/asm/kvm_pgtable.h
> index e70cf57b719e..7634b6964779 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>   
>   typedef u64 kvm_pte_t;
>   
> +/*
> + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> + * table walkers used in hyp do not call into RCU and instead use other
> + * synchronization mechanisms (such as a spinlock).
> + */
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +
>   typedef kvm_pte_t *kvm_pteref_t;
>   
>   static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool 
> shared)
> @@ -44,6 +51,40 @@ static inline kvm_pte_t 
> *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>   return pteref;
>   }
>   
> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> + return true;
> +}
> +
> +#else
> +
> +typedef kvm_pte_t __rcu *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool 
> shared)
> +{
> + return rcu_dereference_check(pteref, !shared);
> +}
> +
> +static inline void kvm_pgtable_walk_begin(void)
> +{
> + rcu_read_lock();
> +}
> +
> +static inline void kvm_pgtable_walk_end(void)
> +{
> + rcu_read_unlock();
> +}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> + return rcu_read_lock_held();
> +}
> +
> +#endif
> +
>   #define KVM_PTE_VALID   BIT(0)
>   
>   #define KVM_PTE_ADDR_MASK   GENMASK(47, PAGE_SHIFT)
> @@ -202,11 +243,14 @@ struct kvm_pgtable {
>*  children.
>* @KVM_PGTABLE_WALK_TABLE_POST:Visit table entries after their
>*  children.
> + * @KVM_PGTABLE_WALK_SHARED: Indicates the page-tables may be shared
> + *   with other software walkers.
>*/
>   enum kvm_pgtable_walk_flags {
>   KVM_PGTABLE_WALK_LEAF   = BIT(0),
>   KVM_PGTABLE_WALK_TABLE_PRE  = BIT(1),
>   KVM_PGTABLE_WALK_TABLE_POST = BIT(2),
> + KVM_PGTABLE_WALK_SHARED = BIT(3),
>   };
>   
>   

Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-10 Thread Ben Gardon
On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton  wrote:
>
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
>
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.
>
> Signed-off-by: Oliver Upton 
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 49 
>  arch/arm64/kvm/hyp/pgtable.c | 10 +-
>  arch/arm64/kvm/mmu.c | 14 +++-
>  3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> b/arch/arm64/include/asm/kvm_pgtable.h
> index e70cf57b719e..7634b6964779 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>
>  typedef u64 kvm_pte_t;
>
> +/*
> + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> + * table walkers used in hyp do not call into RCU and instead use other
> + * synchronization mechanisms (such as a spinlock).
> + */
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +
>  typedef kvm_pte_t *kvm_pteref_t;
>
>  static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool 
> shared)
> @@ -44,6 +51,40 @@ static inline kvm_pte_t 
> *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
> return pteref;
>  }
>
> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +   return true;

Forgive my ignorance, but does hyp not use a MMU lock at all? Seems
like this would be a good place to add a lockdep check.

> +}
> +
> +#else
> +
> +typedef kvm_pte_t __rcu *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool 
> shared)
> +{
> +   return rcu_dereference_check(pteref, !shared);

Same here, could add a lockdep check depending on shared.

> +}
> +
> +static inline void kvm_pgtable_walk_begin(void)
> +{
> +   rcu_read_lock();
> +}
> +
> +static inline void kvm_pgtable_walk_end(void)
> +{
> +   rcu_read_unlock();
> +}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +   return rcu_read_lock_held();

Likewise could do some lockdep here.

> +}
> +
> +#endif
> +
>  #define KVM_PTE_VALID  BIT(0)
>
>  #define KVM_PTE_ADDR_MASK  GENMASK(47, PAGE_SHIFT)
> @@ -202,11 +243,14 @@ struct kvm_pgtable {
>   * children.
>   * @KVM_PGTABLE_WALK_TABLE_POST:   Visit table entries after their
>   * children.
> + * @KVM_PGTABLE_WALK_SHARED:   Indicates the page-tables may be 
> shared
> + * with other software walkers.
>   */
>  enum kvm_pgtable_walk_flags {
> KVM_PGTABLE_WALK_LEAF   = BIT(0),
> KVM_PGTABLE_WALK_TABLE_PRE  = BIT(1),
> KVM_PGTABLE_WALK_TABLE_POST = BIT(2),
> +   KVM_PGTABLE_WALK_SHARED = BIT(3),

Not sure if necessary, but it might pay to have 3 shared options:
exclusive, shared mmu lock, no mmu lock if we ever want lockless fast
page faults.


>  };
>
>  struct kvm_pgtable_visit_ctx {
> @@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
>  typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx 
> *ctx,
> enum kvm_pgtable_walk_flags visit);
>
> +static inline bool kvm_pgtable_walk_shared(const struct 
> kvm_pgtable_visit_ctx *ctx)
> +{
> +   return ctx->flags & KVM_PGTABLE_WALK_SHARED;
> +}
> +
>  /**
>   * struct kvm_pgtable_walker - Hook into a page-table walk.
>   * @cb:Callback function to invoke during the walk.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7c9782347570..d8d963521d4e 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct 
> kvm_pgtable_walk_data *data,
>   enum kvm_pgtable_walk_flags visit)
>  {
> struct kvm_pgtable_walker *walker = data->walker;
> +
> +   /* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 
> MMU) */
> +   WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && 
> !kvm_pgtable_walk_lock_held());
> return walker->cb(ctx, visit);
>  }
>
> @@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 

Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-10 Thread Marc Zyngier
On Wed, 09 Nov 2022 22:25:38 +,
Ben Gardon  wrote:
> 
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton  wrote:
> >
> > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > release the RCU read lock when traversing the page tables. Defer the
> > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > and provide stubs for hypervisor code, as RCU is not available in such a
> > context.
> >
> > The RCU protection doesn't amount to much at the moment, as readers are
> > already protected by the read-write lock (all walkers that free table
> > memory take the write lock). Nonetheless, a subsequent change will
> > futher relax the locking requirements around the stage-2 MMU, thereby
> > depending on RCU.
> >
> > Signed-off-by: Oliver Upton 
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 49 
> >  arch/arm64/kvm/hyp/pgtable.c | 10 +-
> >  arch/arm64/kvm/mmu.c | 14 +++-
> >  3 files changed, 71 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> > b/arch/arm64/include/asm/kvm_pgtable.h
> > index e70cf57b719e..7634b6964779 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
> >
> >  typedef u64 kvm_pte_t;
> >
> > +/*
> > + * RCU cannot be used in a non-kernel context such as the hyp. As such, 
> > page
> > + * table walkers used in hyp do not call into RCU and instead use other
> > + * synchronization mechanisms (such as a spinlock).
> > + */
> > +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> > +
> >  typedef kvm_pte_t *kvm_pteref_t;
> >
> >  static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool 
> > shared)
> > @@ -44,6 +51,40 @@ static inline kvm_pte_t 
> > *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
> > return pteref;
> >  }
> >
> > +static inline void kvm_pgtable_walk_begin(void) {}
> > +static inline void kvm_pgtable_walk_end(void) {}
> > +
> > +static inline bool kvm_pgtable_walk_lock_held(void)
> > +{
> > +   return true;
> 
> Forgive my ignorance, but does hyp not use a MMU lock at all? Seems
> like this would be a good place to add a lockdep check.

For normal KVM, we don't mess with the page tables in the HYP code *at
all*. That's just not the place. It is for pKVM that this is a bit
different, as EL2 is where the stuff happens.

Lockdep at EL2 is wishful thinking. However, we have the next best
thing, which is an assertion such as:

hyp_assert_lock_held(_kvm.lock);

though at the moment, this is a *global* lock that serialises
everyone, as a guest stage-2 operation usually affects the host
stage-2 as well (ownership change and such). Quentin should be able to
provide more details on that.

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-09 Thread Oliver Upton
On Wed, Nov 09, 2022 at 09:53:45PM +, Sean Christopherson wrote:
> On Mon, Nov 07, 2022, Oliver Upton wrote:
> > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > release the RCU read lock when traversing the page tables. Defer the
> > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > and provide stubs for hypervisor code, as RCU is not available in such a
> > context.
> > 
> > The RCU protection doesn't amount to much at the moment, as readers are
> > already protected by the read-write lock (all walkers that free table
> > memory take the write lock). Nonetheless, a subsequent change will
> > futher relax the locking requirements around the stage-2 MMU, thereby
> > depending on RCU.
> 
> Two somewhat off-topic questions (because I'm curious):

Worth asking!

>  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
> faults (handle_access_fault()) and/or write-protection faults without 
> acquiring
> mmu_lock?

I don't have any plans personally.

OTOH, adding support for read-side access faults is trivial, I just
didn't give it much thought as most large-scale implementations have
FEAT_HAFDBS (hardware access flag management).

>  2. If the answer to (1) is "yes!", what's the plan to protect the lockless 
> walks
> for the RCU-less hypervisor code?

If/when we are worried about fault serialization in the lowvisor I was
thinking something along the lines of disabling interrupts and using
IPIs as barriers before freeing removed table memory, crudely giving the
same protection as RCU.

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-09 Thread Sean Christopherson
On Mon, Nov 07, 2022, Oliver Upton wrote:
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
> 
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.

Two somewhat off-topic questions (because I'm curious):

 1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
faults (handle_access_fault()) and/or write-protection faults without 
acquiring
mmu_lock?

 2. If the answer to (1) is "yes!", what's the plan to protect the lockless 
walks
for the RCU-less hypervisor code?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU

2022-11-07 Thread Oliver Upton
Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
release the RCU read lock when traversing the page tables. Defer the
freeing of table memory to an RCU callback. Indirect the calls into RCU
and provide stubs for hypervisor code, as RCU is not available in such a
context.

The RCU protection doesn't amount to much at the moment, as readers are
already protected by the read-write lock (all walkers that free table
memory take the write lock). Nonetheless, a subsequent change will
futher relax the locking requirements around the stage-2 MMU, thereby
depending on RCU.

Signed-off-by: Oliver Upton 
---
 arch/arm64/include/asm/kvm_pgtable.h | 49 
 arch/arm64/kvm/hyp/pgtable.c | 10 +-
 arch/arm64/kvm/mmu.c | 14 +++-
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index e70cf57b719e..7634b6964779 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
 
 typedef u64 kvm_pte_t;
 
+/*
+ * RCU cannot be used in a non-kernel context such as the hyp. As such, page
+ * table walkers used in hyp do not call into RCU and instead use other
+ * synchronization mechanisms (such as a spinlock).
+ */
+#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
+
 typedef kvm_pte_t *kvm_pteref_t;
 
 static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool 
shared)
@@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t 
pteref, bool shared
return pteref;
 }
 
+static inline void kvm_pgtable_walk_begin(void) {}
+static inline void kvm_pgtable_walk_end(void) {}
+
+static inline bool kvm_pgtable_walk_lock_held(void)
+{
+   return true;
+}
+
+#else
+
+typedef kvm_pte_t __rcu *kvm_pteref_t;
+
+static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool 
shared)
+{
+   return rcu_dereference_check(pteref, !shared);
+}
+
+static inline void kvm_pgtable_walk_begin(void)
+{
+   rcu_read_lock();
+}
+
+static inline void kvm_pgtable_walk_end(void)
+{
+   rcu_read_unlock();
+}
+
+static inline bool kvm_pgtable_walk_lock_held(void)
+{
+   return rcu_read_lock_held();
+}
+
+#endif
+
 #define KVM_PTE_VALID  BIT(0)
 
 #define KVM_PTE_ADDR_MASK  GENMASK(47, PAGE_SHIFT)
@@ -202,11 +243,14 @@ struct kvm_pgtable {
  * children.
  * @KVM_PGTABLE_WALK_TABLE_POST:   Visit table entries after their
  * children.
+ * @KVM_PGTABLE_WALK_SHARED:   Indicates the page-tables may be shared
+ * with other software walkers.
  */
 enum kvm_pgtable_walk_flags {
KVM_PGTABLE_WALK_LEAF   = BIT(0),
KVM_PGTABLE_WALK_TABLE_PRE  = BIT(1),
KVM_PGTABLE_WALK_TABLE_POST = BIT(2),
+   KVM_PGTABLE_WALK_SHARED = BIT(3),
 };
 
 struct kvm_pgtable_visit_ctx {
@@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
 typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx 
*ctx,
enum kvm_pgtable_walk_flags visit);
 
+static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx 
*ctx)
+{
+   return ctx->flags & KVM_PGTABLE_WALK_SHARED;
+}
+
 /**
  * struct kvm_pgtable_walker - Hook into a page-table walk.
  * @cb:Callback function to invoke during the walk.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7c9782347570..d8d963521d4e 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct 
kvm_pgtable_walk_data *data,
  enum kvm_pgtable_walk_flags visit)
 {
struct kvm_pgtable_walker *walker = data->walker;
+
+   /* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) 
*/
+   WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && 
!kvm_pgtable_walk_lock_held());
return walker->cb(ctx, visit);
 }
 
@@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, 
u64 size,
.end= PAGE_ALIGN(walk_data.addr + size),
.walker = walker,
};
+   int r;
+
+   kvm_pgtable_walk_begin();
+   r = _kvm_pgtable_walk(pgt, _data);
+   kvm_pgtable_walk_end();
 
-   return _kvm_pgtable_walk(pgt, _data);
+   return r;
 }
 
 struct leaf_walk_data {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 73ae908eb5d9..52e042399ba5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t 
size)
 
 static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
 
+static void