Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure

2021-07-20 Thread Quentin Perret
On Tuesday 20 Jul 2021 at 14:15:56 (+0100), Marc Zyngier wrote:
> On Tue, 20 Jul 2021 12:13:20 +0100,
> Quentin Perret  wrote:
> > 
> > On Thursday 15 Jul 2021 at 17:31:47 (+0100), Marc Zyngier wrote:
> > > +struct s2_walk_data {
> > > + kvm_pte_t   pteval;
> > > + u32 level;
> > > +};
> > > +
> > > +static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > > +  enum kvm_pgtable_walk_flags flag, void * const arg)
> > > +{
> > > + struct s2_walk_data *data = arg;
> > > +
> > > + data->level = level;
> > > + data->pteval = *ptep;
> > > + return 0;
> > > +}
> > > +
> > > +/* Assumes mmu_lock taken */
> > > +static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > > +{
> > > + struct s2_walk_data data;
> > > + struct kvm_pgtable_walker walker = {
> > > + .cb = s2_walker,
> > > + .flags  = KVM_PGTABLE_WALK_LEAF,
> > > + .arg= ,
> > > + };
> > > +
> > > + kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, PAGE_SIZE),
> > > +  PAGE_SIZE, );
> > > +
> > > + /* Must be a PAGE_SIZE mapping with our annotation */
> > > + return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == PAGE_SIZE &&
> > > + data.pteval == MMIO_NOTE);
> > 
> > Nit: you could do this check in the walker directly and check the return
> > value of kvm_pgtable_walk() instead. That would allow to get rid of
> > struct s2_walk_data.
> > 
> > Also, though the compiler might be able to optimize, maybe simplify the
> > level check to level == (KVM_PGTABLE_MAX_LEVELS - 1)?
> 
> Yup, all good points. I guess I could do the same in my other series
> that parses the userspace PT to extract the level.

Well, actually, let me take that back. I think something like you have
would be useful, but in pgtable.c directly and re-usable for stage-1 and
stage-2 walks. Maybe something like the below (totally untested)?

I could use such a walker in several places as well in the memory
ownership series:

 - following the idea of [1], I could remove the
   kvm_pgtable_stage2_find_range() function entirely;

 - [2] defines 2 custom walkers that do nothing but walk host stage-2
   and hyp stage-1 page-tables to check permissions and such --  they
   could be removed/re-implemented easily as well.

And you seem to need something similar here, so clearly there is a need.
WDYT?

Thanks,
Quentin

[1] https://lore.kernel.org/kvmarm/20210719104735.3681732-3-qper...@google.com/
[2] https://lore.kernel.org/kvmarm/20210719104735.3681732-14-qper...@google.com/

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index e0ae57dca827..bd6d26f27e1a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -357,6 +357,38 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, 
u64 size,
return _kvm_pgtable_walk(_data);
 }

+struct get_leaf_data {
+   kvm_pte_t *ptep;
+   u32 *level;
+};
+
+static int get_leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+  enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+   struct get_leaf_data *data = arg;
+
+   *(data->ptep) = *ptep;
+   *(data->level) = level;
+
+   return 0;
+}
+
+int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr, kvm_pte_t *ptep,
+u32 *level)
+{
+   struct get_leaf_data data = {
+   .ptep = ptep,
+   .level = level,
+   };
+   struct kvm_pgtable_walker __get_leaf_walker = {
+   .cb = get_leaf_walker,
+   .flags  = KVM_PGTABLE_WALK_LEAF,
+   .arg= ,
+   };
+
+   return kvm_pgtable_walk(pgt, addr, PAGE_SIZE, &__get_leaf_walker);
+}
+
 struct hyp_map_data {
u64 phys;
kvm_pte_t   attr;

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/14] KVM: arm64: Add support for tagging shared pages in page-table

2021-07-20 Thread Quentin Perret
On Tuesday 20 Jul 2021 at 14:48:09 (+0100), Fuad Tabba wrote:
> This might tie in to Marc's comments for using enums, but
> consolidating the translation between prot and ignored/software bits
> in one place would be good: thinking about patch 10 as well, where you
> get the prot from the ignored bits. Even though you have documented
> it, I'm finding the part where a field can be borrowed and shared as
> opposed to being only shared not very intuitive, and I need to reread
> the comment here to remember the difference while going through the
> code.
> 
> You also mention lending as potentially reserved for the future, but I
> think that lending is the other side of borrowing (depends on who's
> doing the giving/taking). I wonder if in this case it would be clearer
> to describe it in terms of whether it's exclusively owned vs owned but
> shared (for the owner), and just shared for the sharer...

Argh so I actually found the encoding pretty neat :/
The idea is the following:

  - an entity that has a page mapped as SHARED in its PT means it
doesn't have exclusive access to the page;

  - an entity that has a page mapped as BORROWED in its PT means it has
access to a page it doesn't own;

>From that we can build the states we need:

  - when an entity shares a page with another, the original owner gets a
SHARED mapping, and the recipient a SHARED+BORROWED mapping.

  - and in the future when/if we implement lending (which means an
entity gives exclusive access to a page to another entity, but
retains ownership) we can map the page in the recipient as
'BORROWED' only, but not 'SHARED'. And the original owner will have
an invalid mapping with a new state 'LENT', which is encoded with
both SW bits set.

How does that sound? Did you have something else in mind?

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 05/15] KVM: arm64: Refactor sys_regs.h, c for nVHE reuse

2021-07-20 Thread Fuad Tabba
Hi,

On Tue, Jul 20, 2021 at 2:38 PM Andrew Jones  wrote:
>
> On Mon, Jul 19, 2021 at 05:03:36PM +0100, Fuad Tabba wrote:
> > Refactor sys_regs.h and sys_regs.c to make it easier to reuse
> > common code. It will be used in nVHE in a later patch.
> >
> > Note that the refactored code uses __inline_bsearch for find_reg
> > instead of bsearch to avoid copying the bsearch code for nVHE.
> >
> > No functional change intended.
> >
> > Signed-off-by: Fuad Tabba 
> > ---
> >  arch/arm64/include/asm/sysreg.h |  3 +++
> >  arch/arm64/kvm/sys_regs.c   | 30 +-
> >  arch/arm64/kvm/sys_regs.h   | 31 +++
> >  3 files changed, 35 insertions(+), 29 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/sysreg.h 
> > b/arch/arm64/include/asm/sysreg.h
> > index 7b9c3acba684..326f49e7bd42 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -1153,6 +1153,9 @@
> >  #define ICH_VTR_A3V_SHIFT21
> >  #define ICH_VTR_A3V_MASK (1 << ICH_VTR_A3V_SHIFT)
> >
> > +/* Extract the feature specified from the feature id register. */
> > +#define FEATURE(x)   (GENMASK_ULL(x##_SHIFT + 3, x##_SHIFT))
>
> I think the comment would be better as
>
>  Create a mask for the feature bits of the specified feature.

I agree. I'll use this instead.

> And, I think a more specific name than FEATURE would be better. Maybe
> FEATURE_MASK or even ARM64_FEATURE_MASK ?

I think so too. ARM64_FEATURE_MASK is more descriptive than just FEATURE.

Thanks,
/fuad

> > +
> >  #ifdef __ASSEMBLY__
> >
> >   .irp
> > num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 80a6e41cadad..1a939c464858 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -44,10 +44,6 @@
> >   * 64bit interface.
> >   */
> >
> > -#define reg_to_encoding(x)   \
> > - sys_reg((u32)(x)->Op0, (u32)(x)->Op1,   \
> > - (u32)(x)->CRn, (u32)(x)->CRm, (u32)(x)->Op2)
> > -
> >  static bool read_from_write_only(struct kvm_vcpu *vcpu,
> >struct sys_reg_params *params,
> >const struct sys_reg_desc *r)
> > @@ -1026,8 +1022,6 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
> >   return true;
> >  }
> >
> > -#define FEATURE(x)   (GENMASK_ULL(x##_SHIFT + 3, x##_SHIFT))
> > -
> >  /* Read a sanitised cpufeature ID register by sys_reg_desc */
> >  static u64 read_id_reg(const struct kvm_vcpu *vcpu,
> >   struct sys_reg_desc const *r, bool raz)
> > @@ -2106,23 +2100,6 @@ static int check_sysreg_table(const struct 
> > sys_reg_desc *table, unsigned int n,
> >   return 0;
> >  }
> >
> > -static int match_sys_reg(const void *key, const void *elt)
> > -{
> > - const unsigned long pval = (unsigned long)key;
> > - const struct sys_reg_desc *r = elt;
> > -
> > - return pval - reg_to_encoding(r);
> > -}
> > -
> > -static const struct sys_reg_desc *find_reg(const struct sys_reg_params 
> > *params,
> > -  const struct sys_reg_desc table[],
> > -  unsigned int num)
> > -{
> > - unsigned long pval = reg_to_encoding(params);
> > -
> > - return bsearch((void *)pval, table, num, sizeof(table[0]), 
> > match_sys_reg);
> > -}
> > -
> >  int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu)
> >  {
> >   kvm_inject_undefined(vcpu);
> > @@ -2365,13 +2342,8 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
> >
> >   trace_kvm_handle_sys_reg(esr);
> >
> > - params.Op0 = (esr >> 20) & 3;
> > - params.Op1 = (esr >> 14) & 0x7;
> > - params.CRn = (esr >> 10) & 0xf;
> > - params.CRm = (esr >> 1) & 0xf;
> > - params.Op2 = (esr >> 17) & 0x7;
> > + params = esr_sys64_to_params(esr);
> >   params.regval = vcpu_get_reg(vcpu, Rt);
> > - params.is_write = !(esr & 1);
> >
> >   ret = emulate_sys_reg(vcpu, );
> >
> > diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> > index 9d0621417c2a..cc0cc95a0280 100644
> > --- a/arch/arm64/kvm/sys_regs.h
> > +++ b/arch/arm64/kvm/sys_regs.h
> > @@ -11,6 +11,12 @@
> >  #ifndef __ARM64_KVM_SYS_REGS_LOCAL_H__
> >  #define __ARM64_KVM_SYS_REGS_LOCAL_H__
> >
> > +#include 
> > +
> > +#define reg_to_encoding(x)   \
> > + sys_reg((u32)(x)->Op0, (u32)(x)->Op1,   \
> > + (u32)(x)->CRn, (u32)(x)->CRm, (u32)(x)->Op2)
> > +
> >  struct sys_reg_params {
> >   u8  Op0;
> >   u8  Op1;
> > @@ -21,6 +27,14 @@ struct sys_reg_params {
> >   boolis_write;
> >  };
> >
> > +#define esr_sys64_to_params(esr)   
> > \
> > + ((struct sys_reg_params){ .Op0 = ((esr) >> 20) & 3,   

Re: [PATCH 01/14] KVM: arm64: Provide the host_stage2_try() helper macro

2021-07-20 Thread Fuad Tabba
Hi Quentin,

On Mon, Jul 19, 2021 at 11:47 AM Quentin Perret  wrote:
>
> We currently unmap all MMIO mappings from the host stage-2 to recycle
> the pages whenever we run out. In order to make this pattern easy to
> re-use from other places, factor the logic out into a dedicated macro.
> While at it, apply the macro for the kvm_pgtable_stage2_set_owner()
> calls. They're currently only called early on and are guaranteed to
> succeed, but making them robust to the -ENOMEM case doesn't hurt and
> will avoid painful debugging sessions later on.
>
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 38 ++-
>  1 file changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
> b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index d938ce95d3bd..56f2117c877b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -208,6 +208,23 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
>   prot, _s2_pool);
>  }
>
> +/*
> + * The pool has been provided with enough pages to cover all of memory with
> + * page granularity, but it is difficult to know how much of the MMIO range
> + * we will need to cover upfront, so we may need to 'recycle' the pages if we
> + * run out.
> + */
> +#define host_stage2_try(fn, ...)   \
> +   ({  \
> +   int __ret = fn(__VA_ARGS__);\
> +   if (__ret == -ENOMEM) { \
> +   __ret = host_stage2_unmap_dev_all();\
> +   if (!__ret) \
> +   __ret = fn(__VA_ARGS__);\
> +   }   \
> +   __ret;  \
> +})

I think it might be good to document that this macro expects the
host_kvm.lock to be held.

Thanks,
/fuad

>  static int host_stage2_idmap(u64 addr)
>  {
> enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> @@ -223,22 +240,7 @@ static int host_stage2_idmap(u64 addr)
> if (ret)
> goto unlock;
>
> -   ret = __host_stage2_idmap(range.start, range.end, prot);
> -   if (ret != -ENOMEM)
> -   goto unlock;
> -
> -   /*
> -* The pool has been provided with enough pages to cover all of memory
> -* with page granularity, but it is difficult to know how much of the
> -* MMIO range we will need to cover upfront, so we may need to 
> 'recycle'
> -* the pages if we run out.
> -*/
> -   ret = host_stage2_unmap_dev_all();
> -   if (ret)
> -   goto unlock;
> -
> -   ret = __host_stage2_idmap(range.start, range.end, prot);
> -
> +   ret = host_stage2_try(__host_stage2_idmap, range.start, range.end, 
> prot);
>  unlock:
> hyp_spin_unlock(_kvm.lock);
>
> @@ -257,8 +259,8 @@ int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
> return -EINVAL;
>
> hyp_spin_lock(_kvm.lock);
> -   ret = kvm_pgtable_stage2_set_owner(_kvm.pgt, start, end - start,
> -  _s2_pool, pkvm_hyp_id);
> +   ret = host_stage2_try(kvm_pgtable_stage2_set_owner, _kvm.pgt,
> + start, end - start, _s2_pool, pkvm_hyp_id);
> hyp_spin_unlock(_kvm.lock);
>
> return ret != -EAGAIN ? ret : 0;
> --
> 2.32.0.402.g57bb445576-goog
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/14] KVM: arm64: Add support for tagging shared pages in page-table

2021-07-20 Thread Fuad Tabba
Hi Quentin,

On Mon, Jul 19, 2021 at 11:47 AM Quentin Perret  wrote:
>
> The hypervisor will soon be in charge of tracking ownership of all
> memory pages in the system. The current page-tracking infrastructure at
> EL2 only allows binary states: a page is either owned or not by an
> entity. But a number of use-cases will require more complex states for
> pages that are shared between two entities (host, hypervisor, or guests).
>
> In preparation for supporting these use-cases, introduce in the KVM
> page-table library some infrastructure allowing to tag shared pages
> using ignored bits (a.k.a. software bits) in PTEs.
>
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  5 +
>  arch/arm64/kvm/hyp/pgtable.c | 25 +
>  2 files changed, 30 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> b/arch/arm64/include/asm/kvm_pgtable.h
> index dd72653314c7..f6d3d5c8910d 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -81,6 +81,8 @@ enum kvm_pgtable_stage2_flags {
>   * @KVM_PGTABLE_PROT_W:Write permission.
>   * @KVM_PGTABLE_PROT_R:Read permission.
>   * @KVM_PGTABLE_PROT_DEVICE:   Device attributes.
> + * @KVM_PGTABLE_STATE_SHARED:  Page shared with another entity.
> + * @KVM_PGTABLE_STATE_BORROWED:Page borrowed from another entity.
>   */
>  enum kvm_pgtable_prot {
> KVM_PGTABLE_PROT_X  = BIT(0),
> @@ -88,6 +90,9 @@ enum kvm_pgtable_prot {
> KVM_PGTABLE_PROT_R  = BIT(2),
>
> KVM_PGTABLE_PROT_DEVICE = BIT(3),
> +
> +   KVM_PGTABLE_STATE_SHARED= BIT(4),
> +   KVM_PGTABLE_STATE_BORROWED  = BIT(5),
>  };
>
>  #define KVM_PGTABLE_PROT_RW(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 5bdbe7a31551..51598b79dafc 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -211,6 +211,29 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
> return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
>  }
>
> +static kvm_pte_t pte_ignored_bit_prot(enum kvm_pgtable_prot prot)
> +{
> +   kvm_pte_t ignored_bits = 0;
> +
> +   /*
> +* Ignored bits 0 and 1 are reserved to track the memory ownership
> +* state of each page:
> +*   00: The page is owned solely by the page-table owner.
> +*   01: The page is owned by the page-table owner, but is shared
> +*   with another entity.
> +*   10: The page is shared with, but not owned by the page-table 
> owner.
> +*   11: Reserved for future use (lending).
> +*/
> +   if (prot & KVM_PGTABLE_STATE_SHARED) {
> +   if (prot & KVM_PGTABLE_STATE_BORROWED)
> +   ignored_bits |= BIT(1);
> +   else
> +   ignored_bits |= BIT(0);
> +   }

This might tie in to Marc's comments for using enums, but
consolidating the translation between prot and ignored/software bits
in one place would be good: thinking about patch 10 as well, where you
get the prot from the ignored bits. Even though you have documented
it, I'm finding the part where a field can be borrowed and shared as
opposed to being only shared not very intuitive, and I need to reread
the comment here to remember the difference while going through the
code.

You also mention lending as potentially reserved for the future, but I
think that lending is the other side of borrowing (depends on who's
doing the giving/taking). I wonder if in this case it would be clearer
to describe it in terms of whether it's exclusively owned vs owned but
shared (for the owner), and just shared for the sharer...

Thanks,
/fuad


> +   return FIELD_PREP(KVM_PTE_LEAF_ATTR_IGNORED, ignored_bits);
> +}
> +
>  static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 
> addr,
>   u32 level, kvm_pte_t *ptep,
>   enum kvm_pgtable_walk_flags flag)
> @@ -357,6 +380,7 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, 
> kvm_pte_t *ptep)
> attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
> attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
> +   attr |= pte_ignored_bit_prot(prot);
> *ptep = attr;
>
> return 0;
> @@ -558,6 +582,7 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, 
> enum kvm_pgtable_prot p
>
> attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
> +   attr |= pte_ignored_bit_prot(prot);
> *ptep = attr;
>
> return 0;
> --
> 2.32.0.402.g57bb445576-goog
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu

Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size

2021-07-20 Thread Alexandru Elisei
Hi Marc,

I just can't figure out why having the mmap lock is not needed to walk the
userspace page tables. Any hints? Or am I not seeing where it's taken?

On 7/17/21 10:55 AM, Marc Zyngier wrote:
> We currently rely on the kvm_is_transparent_hugepage() helper to
> discover whether a given page has the potential to be mapped as
> a block mapping.
>
> However, this API doesn't really give un everything we want:
> - we don't get the size: this is not crucial today as we only
>   support PMD-sized THPs, but we'd like to have larger sizes
>   in the future
> - we're the only user left of the API, and there is a will
>   to remove it altogether
>
> To address the above, implement a simple walker using the existing
> page table infrastructure, and plumb it into transparent_hugepage_adjust().
> No new page sizes are supported in the process.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/mmu.c | 46 
>  1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 3155c9e778f0..db6314b93e99 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -433,6 +433,44 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, 
> size_t size,
>   return 0;
>  }
>  
> +static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
> + /* We shouldn't need any other callback to walk the PT */
> + .phys_to_virt   = kvm_host_va,
> +};
> +
> +struct user_walk_data {
> + u32 level;
> +};
> +
> +static int user_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> + struct user_walk_data *data = arg;
> +
> + data->level = level;
> + return 0;
> +}
> +
> +static int get_user_mapping_size(struct kvm *kvm, u64 addr)
> +{
> + struct user_walk_data data;
> + struct kvm_pgtable pgt = {
> + .pgd= (kvm_pte_t *)kvm->mm->pgd,
> + .ia_bits= VA_BITS,
> + .start_level= 4 - CONFIG_PGTABLE_LEVELS,
> + .mm_ops = _user_mm_ops,
> + };
> + struct kvm_pgtable_walker walker = {
> + .cb = user_walker,
> + .flags  = KVM_PGTABLE_WALK_LEAF,
> + .arg= ,
> + };
> +
> + kvm_pgtable_walk(, ALIGN_DOWN(addr, PAGE_SIZE), PAGE_SIZE, );

I take it that it is guaranteed that kvm_pgtable_walk() will never fail? For
example, I can see it failing if someone messes with KVM_PGTABLE_MAX_LEVELS. To 
be
honest, I would rather have a check here instead of potentially feeding a bogus
value to ARM64_HW_PGTABLE_LEVEL_SHIFT. It could be a VM_WARN_ON, so there's no
runtime overhead unless CONFIG_DEBUG_VM.

The patch looks good to me so far, but I want to give it another look (or two)
after I figure out why the mmap semaphone is not needed.

Thanks,

Alex

> +
> + return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level));
> +}
> +
>  static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
>   .zalloc_page= stage2_memcache_zalloc_page,
>   .zalloc_pages_exact = kvm_host_zalloc_pages_exact,
> @@ -780,7 +818,7 @@ static bool fault_supports_stage2_huge_mapping(struct 
> kvm_memory_slot *memslot,
>   * Returns the size of the mapping.
>   */
>  static unsigned long
> -transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> +transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
>   unsigned long hva, kvm_pfn_t *pfnp,
>   phys_addr_t *ipap)
>  {
> @@ -791,8 +829,8 @@ transparent_hugepage_adjust(struct kvm_memory_slot 
> *memslot,
>* sure that the HVA and IPA are sufficiently aligned and that the
>* block map is contained within the memslot.
>*/
> - if (kvm_is_transparent_hugepage(pfn) &&
> - fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
> + if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) &&
> + get_user_mapping_size(kvm, hva) >= PMD_SIZE) {
>   /*
>* The address we faulted on is backed by a transparent huge
>* page.  However, because we map the compound huge page and
> @@ -1051,7 +1089,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>* backed by a THP and thus use block mapping if possible.
>*/
>   if (vma_pagesize == PAGE_SIZE && !(force_pte || device))
> - vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> + vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva,
>  , _ipa);
>  
>   if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


***SPAM*** UNSUBSCRIBE

2021-07-20 Thread Jim Straus
Please unsubscribe str...@imperas.com from the kvmarm email list

-- 

===
The information contained in this electronic mail message and any
attachments hereto is privileged
and confidential information intended only for the use of the individual or
entity named above or their
designee.  If the reader of this message is not the intended recipient, you
are hereby notified that
any dissemination, distribution or copying of this communication is
strictly prohibited.  If you have
received this communication in error please immediately notify us by
return  message or by
telephone and delete the original message from your mail system.  Thank you.
===
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/4] KVM: arm64: Remove PMSWINC_EL0 shadow register

2021-07-20 Thread Alexandru Elisei
Hi Marc,

On 7/19/21 5:56 PM, Marc Zyngier wrote:
> Hi Alex,
>
> On 2021-07-19 17:35, Alexandru Elisei wrote:
>> Hi Marc,
>>
>> On 7/19/21 1:39 PM, Marc Zyngier wrote:
>>> We keep an entry for the PMSWINC_EL0 register in the vcpu structure,
>>> while *never* writing anything there outside of reset.
>>>
>>> Given that the register is defined as write-only, that we always
>>> trap when this register is accessed, there is little point in saving
>>> anything anyway.
>>>
>>> Get rid of the entry, and save a mighty 8 bytes per vcpu structure.
>>>
>>> We still need to keep it exposed to userspace in order to preserve
>>> backward compatibility with previously saved VMs. Since userspace
>>> cannot expect any effect of writing to PMSWINC_EL0, treat the
>>> register as RAZ/WI for the purpose of userspace access.
>>>
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  arch/arm64/include/asm/kvm_host.h |  1 -
>>>  arch/arm64/kvm/sys_regs.c | 21 -
>>>  2 files changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h
>>> b/arch/arm64/include/asm/kvm_host.h
>>> index 41911585ae0c..afc169630884 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -185,7 +185,6 @@ enum vcpu_sysreg {
>>>  PMCNTENSET_EL0,    /* Count Enable Set Register */
>>>  PMINTENSET_EL1,    /* Interrupt Enable Set Register */
>>>  PMOVSSET_EL0,    /* Overflow Flag Status Set Register */
>>> -    PMSWINC_EL0,    /* Software Increment Register */
>>>  PMUSERENR_EL0,    /* User Enable Register */
>>>
>>>  /* Pointer Authentication Registers in a strict increasing order. */
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index f22139658e48..a1f5101f49a3 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -1286,6 +1286,20 @@ static int set_raz_id_reg(struct kvm_vcpu *vcpu, 
>>> const
>>> struct sys_reg_desc *rd,
>>>  return __set_id_reg(vcpu, rd, uaddr, true);
>>>  }
>>>
>>> +static int set_wi_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
>>> +  const struct kvm_one_reg *reg, void __user *uaddr)
>>> +{
>>> +    int err;
>>> +    u64 val;
>>> +
>>> +    /* Perform the access even if we are going to ignore the value */
>>> +    err = reg_from_user(, uaddr, sys_reg_to_index(rd));
>>
>> I don't understand why the read still happens if the value is ignored.
>> Just so KVM
>> preserves the previous behaviour and tells userspace there was an error?
>
> If userspace has given us a duff pointer, it needs to know about it.

Makes sense, thanks.

>
>>> +    if (err)
>>> +    return err;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>  static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>>     const struct sys_reg_desc *r)
>>>  {
>>> @@ -1629,8 +1643,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>>    .access = access_pmcnten, .reg = PMCNTENSET_EL0 },
>>>  { PMU_SYS_REG(SYS_PMOVSCLR_EL0),
>>>    .access = access_pmovs, .reg = PMOVSSET_EL0 },
>>> +    /*
>>> + * PM_SWINC_EL0 is exposed to userspace as RAZ/WI, as it was
>>> + * previously (and pointlessly) advertised in the past...
>>> + */
>>>  { PMU_SYS_REG(SYS_PMSWINC_EL0),
>>> -  .access = access_pmswinc, .reg = PMSWINC_EL0 },
>>> +  .get_user = get_raz_id_reg, .set_user = set_wi_reg,
>>
>> In my opinion, the call chain to return 0 looks pretty confusing to me, as 
>> the
>> functions seemed made for ID register accesses, and the leaf function,
>> read_id_reg(), tries to match this register with a list of ID
>> registers. Since we
>> have already added a new function just for PMSWINC_EL0, I was
>> wondering if adding
>> another function, something like get_raz_reg(), would make more sense.
>
> In that case, I'd rather just kill get_raz_id_reg() and replace it with
> this get_raz_reg(). If we trat something as RAZ, who cares whether it is
> an idreg or not?

I agree, the Arm ARM doesn't make the distinction between ID registers and other
system registers in the definition of RAZ, I don't think KVM should either. And
the way read_id_reg() is written allows returning a value different than 0 even 
if
raz is true, which in my opinion could only happen because of a bug in KVM.

I can have a go at writing the patch(es) on top of this series, if you want. At
the moment I'm rewriting the KVM SPE series, so it will be a few weeks until I 
get
around to doing it though.

Thanks,

Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/14] KVM: arm64: Continue stage-2 map when re-creating mappings

2021-07-20 Thread Quentin Perret
On Tuesday 20 Jul 2021 at 09:26:10 (+0100), Marc Zyngier wrote:
> Right, but this is on a different path, right? Guests can never fault
> multiple mappings at once, and it takes you a host hypercall to
> perform this "multiple leaves at once".

Right.

> Is there any way we can restrict this to the hypercall? Or even
> better, keep the hypercall as a "one page at a time" thing? I can't
> imagine it being performance critical (it is a once-off, and only used
> over a rather small region of memory). Then, the called doesn't have
> to worry about the page already being mapped or not. This would also
> match the behaviour of what I do on the MMIO side.
> 
> Or do you anticipate much gain from this being able to use block
> mappings?

Not really no, especially given that mappings of shared pages will be
forced to page granularity thanks to the other patch we discussed in
this series. I was just hoping to reduce the overhead a bit by reducing
the number of hypercalls. But as you said, this probably doesn't matter
all that much, so happy to rework that. I'll look into making the hcall
use one page at a time, and hopefully that'll simplify a few other
things in the check_host_share_hyp() path near the end of this series.

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/14] KVM: arm64: Add support for tagging shared pages in page-table

2021-07-20 Thread Quentin Perret
On Tuesday 20 Jul 2021 at 11:13:31 (+0100), Marc Zyngier wrote:
> On Mon, 19 Jul 2021 16:49:13 +0100,
> Quentin Perret  wrote:
> > 
> > On Monday 19 Jul 2021 at 15:43:34 (+0100), Marc Zyngier wrote:
> > > On Mon, 19 Jul 2021 11:47:29 +0100,
> > > Quentin Perret  wrote:
> > > > 
> > > > The hypervisor will soon be in charge of tracking ownership of all
> > > > memory pages in the system. The current page-tracking infrastructure at
> > > > EL2 only allows binary states: a page is either owned or not by an
> > > > entity. But a number of use-cases will require more complex states for
> > > > pages that are shared between two entities (host, hypervisor, or 
> > > > guests).
> > > > 
> > > > In preparation for supporting these use-cases, introduce in the KVM
> > > > page-table library some infrastructure allowing to tag shared pages
> > > > using ignored bits (a.k.a. software bits) in PTEs.
> > > > 
> > > > Signed-off-by: Quentin Perret 
> > > > ---
> > > >  arch/arm64/include/asm/kvm_pgtable.h |  5 +
> > > >  arch/arm64/kvm/hyp/pgtable.c | 25 +
> > > >  2 files changed, 30 insertions(+)
> > > > 
> > > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> > > > b/arch/arm64/include/asm/kvm_pgtable.h
> > > > index dd72653314c7..f6d3d5c8910d 100644
> > > > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > > > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > > > @@ -81,6 +81,8 @@ enum kvm_pgtable_stage2_flags {
> > > >   * @KVM_PGTABLE_PROT_W:Write permission.
> > > >   * @KVM_PGTABLE_PROT_R:Read permission.
> > > >   * @KVM_PGTABLE_PROT_DEVICE:   Device attributes.
> > > > + * @KVM_PGTABLE_STATE_SHARED:  Page shared with another entity.
> > > > + * @KVM_PGTABLE_STATE_BORROWED:Page borrowed from another 
> > > > entity.
> > > >   */
> > > >  enum kvm_pgtable_prot {
> > > > KVM_PGTABLE_PROT_X  = BIT(0),
> > > > @@ -88,6 +90,9 @@ enum kvm_pgtable_prot {
> > > > KVM_PGTABLE_PROT_R  = BIT(2),
> > > >  
> > > > KVM_PGTABLE_PROT_DEVICE = BIT(3),
> > > > +
> > > > +   KVM_PGTABLE_STATE_SHARED= BIT(4),
> > > > +   KVM_PGTABLE_STATE_BORROWED  = BIT(5),
> > > 
> > > I'd rather have some indirection here, as we have other potential
> > > users for the SW bits outside of pKVM (see the NV series, which uses
> > > some of these SW bits as the backend for TTL-based TLB invalidation).
> > > 
> > > Can we instead only describe the SW bit states in this enum, and let
> > > the users map the semantic they require onto that state? See [1] for
> > > what I carry in the NV branch.
> > 
> > Works for me -- I just wanted to make sure we don't have users in
> > different places that use the same bits without knowing, but no strong
> > opinions, so happy to change.
> > 
> > > >  };
> > > >  
> > > >  #define KVM_PGTABLE_PROT_RW(KVM_PGTABLE_PROT_R | 
> > > > KVM_PGTABLE_PROT_W)
> > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > > index 5bdbe7a31551..51598b79dafc 100644
> > > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > > @@ -211,6 +211,29 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 
> > > > owner_id)
> > > > return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
> > > >  }
> > > >  
> > > > +static kvm_pte_t pte_ignored_bit_prot(enum kvm_pgtable_prot prot)
> > > 
> > > Can we call these sw rather than ignored?
> > 
> > Sure.
> > 
> > > > +{
> > > > +   kvm_pte_t ignored_bits = 0;
> > > > +
> > > > +   /*
> > > > +* Ignored bits 0 and 1 are reserved to track the memory 
> > > > ownership
> > > > +* state of each page:
> > > > +*   00: The page is owned solely by the page-table owner.
> > > > +*   01: The page is owned by the page-table owner, but is 
> > > > shared
> > > > +*   with another entity.
> > > > +*   10: The page is shared with, but not owned by the 
> > > > page-table owner.
> > > > +*   11: Reserved for future use (lending).
> > > > +*/
> > > > +   if (prot & KVM_PGTABLE_STATE_SHARED) {
> > > > +   if (prot & KVM_PGTABLE_STATE_BORROWED)
> > > > +   ignored_bits |= BIT(1);
> > > > +   else
> > > > +   ignored_bits |= BIT(0);
> > > > +   }
> > > > +
> > > > +   return FIELD_PREP(KVM_PTE_LEAF_ATTR_IGNORED, ignored_bits);
> > > > +}
> > > > +
> > > >  static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, 
> > > > u64 addr,
> > > >   u32 level, kvm_pte_t *ptep,
> > > >   enum kvm_pgtable_walk_flags flag)
> > > > @@ -357,6 +380,7 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot 
> > > > prot, kvm_pte_t *ptep)
> > > > attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> > > > attr |= 

Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

2021-07-20 Thread Quentin Perret
On Tuesday 20 Jul 2021 at 12:20:58 (+0100), Marc Zyngier wrote:
> On Tue, 20 Jul 2021 11:38:17 +0100,
> Quentin Perret  wrote:
> > 
> > On Tuesday 20 Jul 2021 at 11:21:17 (+0100), Marc Zyngier wrote:
> > > On Tue, 20 Jul 2021 11:09:21 +0100,
> > > Quentin Perret  wrote:
> > > > 
> > > > On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > > > > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct 
> > > > > kvm_pgtable *pgt, u64 addr, u64 size,
> > > > >   .arg= _data,
> > > > >   };
> > > > >  
> > > > > - if (owner_id > KVM_MAX_OWNER_ID)
> > > > > + if (!annotation || (annotation & PTE_VALID))
> > > > >   return -EINVAL;
> > > > 
> > > > Why do you consider annotation==0 invalid? The assumption so far has
> > > > been that the owner_id for the host is 0, so annotating a range with 0s
> > > > should be a valid operation -- this will be required when e.g.
> > > > transferring ownership of a page back to the host.
> > > 
> > > How do you then distinguish it from an empty entry that doesn't map to
> > > anything at all?
> > 
> > You don't, but that's beauty of it :)
> > 
> > The host starts with a PGD full of zeroes, which in terms of ownership
> > means that it owns the entire (I)PA space. And it loses ownership of a
> > page only when we explicitly annotate it with an owner id != 0.
> 
> Right. But this scheme doesn't apply to the guests, does it?

Right, the meaning of a NULL PTE in guests will clearly be something
different, but I guess the interpretation of what invalid mappings mean
is up to the caller.

> Don't we
> need something that is non-null to preserve the table refcounting?

Sure, but do we care? If the table entry gets zeroed we're then
basically using an 'invalid block' mapping to annotate the entire block
range with '0', whatever that means. For guests it won't mean much, but
for the host that would mean sole ownership of the entire range.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 15/16] arm64: Add a helper to retrieve the PTE of a fixmap

2021-07-20 Thread Quentin Perret
On Thursday 15 Jul 2021 at 17:31:58 (+0100), Marc Zyngier wrote:
> In order to transfer the early mapping state into KVM's MMIO
> guard infrastucture, provide a small helper that will retrieve
> the associated PTE.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/fixmap.h |  2 ++
>  arch/arm64/mm/mmu.c | 15 +++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> index 4335800201c9..1aae625b944f 100644
> --- a/arch/arm64/include/asm/fixmap.h
> +++ b/arch/arm64/include/asm/fixmap.h
> @@ -105,6 +105,8 @@ void __init early_fixmap_init(void);
>  
>  extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, 
> pgprot_t prot);
>  
> +extern pte_t *__get_fixmap_pte(enum fixed_addresses idx);
> +
>  #include 
>  
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index d74586508448..f1b7abd04025 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1286,6 +1286,21 @@ void __set_fixmap(enum fixed_addresses idx,
>   }
>  }
>  
> +pte_t *__get_fixmap_pte(enum fixed_addresses idx)
> +{
> + unsigned long   addr = __fix_to_virt(idx);

Nit: odd spacing here.

> + pte_t *ptep;
> +
> + BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
> +
> + ptep = fixmap_pte(addr);
> +
> + if (!pte_valid(*ptep))
> + return NULL;
> +
> + return ptep;
> +}
> +
>  void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>  {
>   const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
> -- 
> 2.30.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 06/14] KVM: arm64: Tolerate re-creating hyp mappings to set ignored bits

2021-07-20 Thread Quentin Perret
On Tuesday 20 Jul 2021 at 11:59:27 (+0100), Fuad Tabba wrote:
> Thanks for the clarification. It makes sense to preserve the existing
> behavior, but I was wondering if a comment would be good, describing
> what merits a "needs update"?

Sure thing, I'll add something for v2.

Cheers,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure

2021-07-20 Thread Quentin Perret
On Thursday 15 Jul 2021 at 17:31:47 (+0100), Marc Zyngier wrote:
> +struct s2_walk_data {
> + kvm_pte_t   pteval;
> + u32 level;
> +};
> +
> +static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +  enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> + struct s2_walk_data *data = arg;
> +
> + data->level = level;
> + data->pteval = *ptep;
> + return 0;
> +}
> +
> +/* Assumes mmu_lock taken */
> +static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> +{
> + struct s2_walk_data data;
> + struct kvm_pgtable_walker walker = {
> + .cb = s2_walker,
> + .flags  = KVM_PGTABLE_WALK_LEAF,
> + .arg= ,
> + };
> +
> + kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, PAGE_SIZE),
> +  PAGE_SIZE, );
> +
> + /* Must be a PAGE_SIZE mapping with our annotation */
> + return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == PAGE_SIZE &&
> + data.pteval == MMIO_NOTE);

Nit: you could do this check in the walker directly and check the return
value of kvm_pgtable_walk() instead. That would allow to get rid of
struct s2_walk_data.

Also, though the compiler might be able to optimize, maybe simplify the
level check to level == (KVM_PGTABLE_MAX_LEVELS - 1)?

Thanks,
Quentin

> +}
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 06/14] KVM: arm64: Tolerate re-creating hyp mappings to set ignored bits

2021-07-20 Thread Fuad Tabba
Hi Quentin,

On Tue, Jul 20, 2021 at 11:30 AM 'Quentin Perret' via kernel-team
 wrote:
>
> Hi Fuad,
>
> On Tuesday 20 Jul 2021 at 11:17:03 (+0100), Fuad Tabba wrote:
> > Hi Quentin,
> >
> >
> > On Mon, Jul 19, 2021 at 11:47 AM Quentin Perret  wrote:
> > >
> > > The current hypervisor stage-1 mapping code doesn't allow changing an
> > > existing valid mapping. Relax this condition by allowing changes that
> > > only target ignored bits, as that will soon be needed to annotate shared
> > > pages.
> > >
> > > Signed-off-by: Quentin Perret 
> > > ---
> > >  arch/arm64/kvm/hyp/pgtable.c | 18 --
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index a0ac8c2bc174..34cf67997a82 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -362,6 +362,17 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot 
> > > prot, kvm_pte_t *ptep)
> > > return 0;
> > >  }
> > >
> > > +static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
> > > +{
> > > +   if (old == new)
> > > +   return false;
> > > +
> > > +   if (!kvm_pte_valid(old))
> > > +   return true;
> > > +
> > > +   return !WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_IGNORED);
> >
> > Wouldn't this return false if both ignored and non-ignored bits were
> > different, or is that not possible (judging by the WARN_ON)?
>
> Correct, but that is intentional, see below ;)
>
> > If it is, then it would need an update, wouldn't it?
>
> Maybe, but if you look at what the existing code does, we do skip the
> update if the old mapping is valid and not equal to new. So I kept the
> behaviour as close as possible to this -- if you change any bits outside
> of SW bits you get a WARN and we skip the update, as we already do
> today. But if you touch only SW bits and nothing else, then I let the
> update go through.
>
> That said, I don't think warning and then proceeding to update would be
> terribly wrong, it's just that a change of behaviour felt a bit
> unnecessary for this particular patch.

Thanks for the clarification. It makes sense to preserve the existing
behavior, but I was wondering if a comment would be good, describing
what merits a "needs update"?

Cheers,
/fuad

> Thanks,
> Quentin
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 06/15] KVM: arm64: Restore mdcr_el2 from vcpu

2021-07-20 Thread Andrew Jones
On Mon, Jul 19, 2021 at 05:03:37PM +0100, Fuad Tabba wrote:
> On deactivating traps, restore the value of mdcr_el2 from the
> newly created and preserved host value vcpu context, rather than
> directly reading the hardware register.
> 
> Up until and including this patch the two values are the same,
> i.e., the hardware register and the vcpu one. A future patch will
> be changing the value of mdcr_el2 on activating traps, and this
> ensures that its value will be restored.
> 
> No functional change intended.

I'm probably missing something, but I can't convince myself that the host
will end up with the same mdcr_el2 value after deactivating traps after
this patch as before. We clearly now restore whatever we had when
activating traps (presumably whatever we configured at init_el2_state
time), but is that equivalent to what we had before with the masking and
ORing that this patch drops?

Thanks,
drew

> 
> Signed-off-by: Fuad Tabba 
> ---
>  arch/arm64/include/asm/kvm_host.h   |  5 -
>  arch/arm64/include/asm/kvm_hyp.h|  2 +-
>  arch/arm64/kvm/hyp/include/hyp/switch.h |  6 +-
>  arch/arm64/kvm/hyp/nvhe/switch.c| 11 ++-
>  arch/arm64/kvm/hyp/vhe/switch.c | 12 ++--
>  arch/arm64/kvm/hyp/vhe/sysreg-sr.c  |  2 +-
>  6 files changed, 15 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 4d2d974c1522..76462c6a91ee 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -287,10 +287,13 @@ struct kvm_vcpu_arch {
>   /* Stage 2 paging state used by the hardware on next switch */
>   struct kvm_s2_mmu *hw_mmu;
>  
> - /* HYP configuration */
> + /* Values of trap registers for the guest. */
>   u64 hcr_el2;
>   u64 mdcr_el2;
>  
> + /* Values of trap registers for the host before guest entry. */
> + u64 mdcr_el2_host;
> +
>   /* Exception Information */
>   struct kvm_vcpu_fault_info fault;
>  
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index 9d60b3006efc..657d0c94cf82 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -95,7 +95,7 @@ void __sve_restore_state(void *sve_pffr, u32 *fpsr);
>  
>  #ifndef __KVM_NVHE_HYPERVISOR__
>  void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
> -void deactivate_traps_vhe_put(void);
> +void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu);
>  #endif
>  
>  u64 __guest_enter(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
> b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index e4a2f295a394..a0e78a6027be 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -92,11 +92,15 @@ static inline void __activate_traps_common(struct 
> kvm_vcpu *vcpu)
>   write_sysreg(0, pmselr_el0);
>   write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
>   }
> +
> + vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
>   write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
>  }
>  
> -static inline void __deactivate_traps_common(void)
> +static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
>  {
> + write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
> +
>   write_sysreg(0, hstr_el2);
>   if (kvm_arm_support_pmu_v3())
>   write_sysreg(0, pmuserenr_el0);
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c 
> b/arch/arm64/kvm/hyp/nvhe/switch.c
> index f7af9688c1f7..1778593a08a9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -69,12 +69,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  static void __deactivate_traps(struct kvm_vcpu *vcpu)
>  {
>   extern char __kvm_hyp_host_vector[];
> - u64 mdcr_el2, cptr;
> + u64 cptr;
>  
>   ___deactivate_traps(vcpu);
>  
> - mdcr_el2 = read_sysreg(mdcr_el2);
> -
>   if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
>   u64 val;
>  
> @@ -92,13 +90,8 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
>   isb();
>   }
>  
> - __deactivate_traps_common();
> -
> - mdcr_el2 &= MDCR_EL2_HPMN_MASK;
> - mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
> - mdcr_el2 |= MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT;
> + __deactivate_traps_common(vcpu);
>  
> - write_sysreg(mdcr_el2, mdcr_el2);
>   write_sysreg(this_cpu_ptr(_init_params)->hcr_el2, hcr_el2);
>  
>   cptr = CPTR_EL2_DEFAULT;
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index b3229924d243..0d0c9550fb08 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -91,17 +91,9 @@ void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
>   __activate_traps_common(vcpu);
>  }
>  
> -void deactivate_traps_vhe_put(void)
> +void deactivate_traps_vhe_put(struct 

Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

2021-07-20 Thread Quentin Perret
On Tuesday 20 Jul 2021 at 11:21:17 (+0100), Marc Zyngier wrote:
> On Tue, 20 Jul 2021 11:09:21 +0100,
> Quentin Perret  wrote:
> > 
> > On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable 
> > > *pgt, u64 addr, u64 size,
> > >   .arg= _data,
> > >   };
> > >  
> > > - if (owner_id > KVM_MAX_OWNER_ID)
> > > + if (!annotation || (annotation & PTE_VALID))
> > >   return -EINVAL;
> > 
> > Why do you consider annotation==0 invalid? The assumption so far has
> > been that the owner_id for the host is 0, so annotating a range with 0s
> > should be a valid operation -- this will be required when e.g.
> > transferring ownership of a page back to the host.
> 
> How do you then distinguish it from an empty entry that doesn't map to
> anything at all?

You don't, but that's beauty of it :)

The host starts with a PGD full of zeroes, which in terms of ownership
means that it owns the entire (I)PA space. And it loses ownership of a
page only when we explicitly annotate it with an owner id != 0.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 06/14] KVM: arm64: Tolerate re-creating hyp mappings to set ignored bits

2021-07-20 Thread Quentin Perret
Hi Fuad,

On Tuesday 20 Jul 2021 at 11:17:03 (+0100), Fuad Tabba wrote:
> Hi Quentin,
> 
> 
> On Mon, Jul 19, 2021 at 11:47 AM Quentin Perret  wrote:
> >
> > The current hypervisor stage-1 mapping code doesn't allow changing an
> > existing valid mapping. Relax this condition by allowing changes that
> > only target ignored bits, as that will soon be needed to annotate shared
> > pages.
> >
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 18 --
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index a0ac8c2bc174..34cf67997a82 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -362,6 +362,17 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot 
> > prot, kvm_pte_t *ptep)
> > return 0;
> >  }
> >
> > +static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
> > +{
> > +   if (old == new)
> > +   return false;
> > +
> > +   if (!kvm_pte_valid(old))
> > +   return true;
> > +
> > +   return !WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_IGNORED);
> 
> Wouldn't this return false if both ignored and non-ignored bits were
> different, or is that not possible (judging by the WARN_ON)?

Correct, but that is intentional, see below ;)

> If it is, then it would need an update, wouldn't it?

Maybe, but if you look at what the existing code does, we do skip the
update if the old mapping is valid and not equal to new. So I kept the
behaviour as close as possible to this -- if you change any bits outside
of SW bits you get a WARN and we skip the update, as we already do
today. But if you touch only SW bits and nothing else, then I let the
update go through.

That said, I don't think warning and then proceeding to update would be
terribly wrong, it's just that a change of behaviour felt a bit
unnecessary for this particular patch.

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 06/14] KVM: arm64: Tolerate re-creating hyp mappings to set ignored bits

2021-07-20 Thread Fuad Tabba
Hi Quentin,


On Mon, Jul 19, 2021 at 11:47 AM Quentin Perret  wrote:
>
> The current hypervisor stage-1 mapping code doesn't allow changing an
> existing valid mapping. Relax this condition by allowing changes that
> only target ignored bits, as that will soon be needed to annotate shared
> pages.
>
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index a0ac8c2bc174..34cf67997a82 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -362,6 +362,17 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, 
> kvm_pte_t *ptep)
> return 0;
>  }
>
> +static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
> +{
> +   if (old == new)
> +   return false;
> +
> +   if (!kvm_pte_valid(old))
> +   return true;
> +
> +   return !WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_IGNORED);

Wouldn't this return false if both ignored and non-ignored bits were
different, or is that not possible (judging by the WARN_ON)? If it is,
then it would need an update, wouldn't it?

Thanks,
/fuad


> +}
> +
>  static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> kvm_pte_t *ptep, struct hyp_map_data 
> *data)
>  {
> @@ -371,9 +382,12 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, 
> u32 level,
> if (!kvm_block_mapping_supported(addr, end, phys, level))
> return false;
>
> -   /* Tolerate KVM recreating the exact same mapping */
> +   /*
> +* Tolerate KVM recreating the exact same mapping, or changing ignored
> +* bits.
> +*/
> new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> -   if (old != new && !WARN_ON(kvm_pte_valid(old)))
> +   if (hyp_pte_needs_update(old, new))
> smp_store_release(ptep, new);
>
> data->phys += granule;
> --
> 2.32.0.402.g57bb445576-goog
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

2021-07-20 Thread Quentin Perret
On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, 
> u64 addr, u64 size,
>   .arg= _data,
>   };
>  
> - if (owner_id > KVM_MAX_OWNER_ID)
> + if (!annotation || (annotation & PTE_VALID))
>   return -EINVAL;

Why do you consider annotation==0 invalid? The assumption so far has
been that the owner_id for the host is 0, so annotating a range with 0s
should be a valid operation -- this will be required when e.g.
transferring ownership of a page back to the host.

>  
>   ret = kvm_pgtable_walk(pgt, addr, size, );
> -- 
> 2.30.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure

2021-07-20 Thread Marc Zyngier
On Tue, 20 Jul 2021 12:13:20 +0100,
Quentin Perret  wrote:
> 
> On Thursday 15 Jul 2021 at 17:31:47 (+0100), Marc Zyngier wrote:
> > +struct s2_walk_data {
> > +   kvm_pte_t   pteval;
> > +   u32 level;
> > +};
> > +
> > +static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +enum kvm_pgtable_walk_flags flag, void * const arg)
> > +{
> > +   struct s2_walk_data *data = arg;
> > +
> > +   data->level = level;
> > +   data->pteval = *ptep;
> > +   return 0;
> > +}
> > +
> > +/* Assumes mmu_lock taken */
> > +static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > +{
> > +   struct s2_walk_data data;
> > +   struct kvm_pgtable_walker walker = {
> > +   .cb = s2_walker,
> > +   .flags  = KVM_PGTABLE_WALK_LEAF,
> > +   .arg= ,
> > +   };
> > +
> > +   kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, PAGE_SIZE),
> > +PAGE_SIZE, );
> > +
> > +   /* Must be a PAGE_SIZE mapping with our annotation */
> > +   return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == PAGE_SIZE &&
> > +   data.pteval == MMIO_NOTE);
> 
> Nit: you could do this check in the walker directly and check the return
> value of kvm_pgtable_walk() instead. That would allow to get rid of
> struct s2_walk_data.
> 
> Also, though the compiler might be able to optimize, maybe simplify the
> level check to level == (KVM_PGTABLE_MAX_LEVELS - 1)?

Yup, all good points. I guess I could do the same in my other series
that parses the userspace PT to extract the level.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

2021-07-20 Thread Marc Zyngier
On Tue, 20 Jul 2021 12:36:21 +0100,
Quentin Perret  wrote:
> 
> On Tuesday 20 Jul 2021 at 12:20:58 (+0100), Marc Zyngier wrote:
> > On Tue, 20 Jul 2021 11:38:17 +0100,
> > Quentin Perret  wrote:
> > > 
> > > On Tuesday 20 Jul 2021 at 11:21:17 (+0100), Marc Zyngier wrote:
> > > > On Tue, 20 Jul 2021 11:09:21 +0100,
> > > > Quentin Perret  wrote:
> > > > > 
> > > > > On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > > > > > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct 
> > > > > > kvm_pgtable *pgt, u64 addr, u64 size,
> > > > > > .arg= _data,
> > > > > > };
> > > > > >  
> > > > > > -   if (owner_id > KVM_MAX_OWNER_ID)
> > > > > > +   if (!annotation || (annotation & PTE_VALID))
> > > > > > return -EINVAL;
> > > > > 
> > > > > Why do you consider annotation==0 invalid? The assumption so far has
> > > > > been that the owner_id for the host is 0, so annotating a range with 
> > > > > 0s
> > > > > should be a valid operation -- this will be required when e.g.
> > > > > transferring ownership of a page back to the host.
> > > > 
> > > > How do you then distinguish it from an empty entry that doesn't map to
> > > > anything at all?
> > > 
> > > You don't, but that's beauty of it :)
> > > 
> > > The host starts with a PGD full of zeroes, which in terms of ownership
> > > means that it owns the entire (I)PA space. And it loses ownership of a
> > > page only when we explicitly annotate it with an owner id != 0.
> > 
> > Right. But this scheme doesn't apply to the guests, does it?
> 
> Right, the meaning of a NULL PTE in guests will clearly be something
> different, but I guess the interpretation of what invalid mappings mean
> is up to the caller.
> 
> > Don't we
> > need something that is non-null to preserve the table refcounting?
> 
> Sure, but do we care? If the table entry gets zeroed we're then
> basically using an 'invalid block' mapping to annotate the entire block
> range with '0', whatever that means. For guests it won't mean much, but
> for the host that would mean sole ownership of the entire range.

I see. You let the refcount drop to 0, unmap the table and let
transfer the 0 annotation one level up, covering the whole block.

I guess I'll revert back to allowing 0, but I'd like to make sure we
don't do that for guests unless we actually tear down the address
space (checking for KVM_PGTABLE_S2_IDMAP should work).

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

2021-07-20 Thread Marc Zyngier
On Tue, 20 Jul 2021 11:38:17 +0100,
Quentin Perret  wrote:
> 
> On Tuesday 20 Jul 2021 at 11:21:17 (+0100), Marc Zyngier wrote:
> > On Tue, 20 Jul 2021 11:09:21 +0100,
> > Quentin Perret  wrote:
> > > 
> > > On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > > > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable 
> > > > *pgt, u64 addr, u64 size,
> > > > .arg= _data,
> > > > };
> > > >  
> > > > -   if (owner_id > KVM_MAX_OWNER_ID)
> > > > +   if (!annotation || (annotation & PTE_VALID))
> > > > return -EINVAL;
> > > 
> > > Why do you consider annotation==0 invalid? The assumption so far has
> > > been that the owner_id for the host is 0, so annotating a range with 0s
> > > should be a valid operation -- this will be required when e.g.
> > > transferring ownership of a page back to the host.
> > 
> > How do you then distinguish it from an empty entry that doesn't map to
> > anything at all?
> 
> You don't, but that's beauty of it :)
> 
> The host starts with a PGD full of zeroes, which in terms of ownership
> means that it owns the entire (I)PA space. And it loses ownership of a
> page only when we explicitly annotate it with an owner id != 0.

Right. But this scheme doesn't apply to the guests, does it? Don't we
need something that is non-null to preserve the table refcounting?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

2021-07-20 Thread Marc Zyngier
On Tue, 20 Jul 2021 11:09:21 +0100,
Quentin Perret  wrote:
> 
> On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable 
> > *pgt, u64 addr, u64 size,
> > .arg= _data,
> > };
> >  
> > -   if (owner_id > KVM_MAX_OWNER_ID)
> > +   if (!annotation || (annotation & PTE_VALID))
> > return -EINVAL;
> 
> Why do you consider annotation==0 invalid? The assumption so far has
> been that the owner_id for the host is 0, so annotating a range with 0s
> should be a valid operation -- this will be required when e.g.
> transferring ownership of a page back to the host.

How do you then distinguish it from an empty entry that doesn't map to
anything at all?

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/14] KVM: arm64: Add support for tagging shared pages in page-table

2021-07-20 Thread Marc Zyngier
On Mon, 19 Jul 2021 16:49:13 +0100,
Quentin Perret  wrote:
> 
> On Monday 19 Jul 2021 at 15:43:34 (+0100), Marc Zyngier wrote:
> > On Mon, 19 Jul 2021 11:47:29 +0100,
> > Quentin Perret  wrote:
> > > 
> > > The hypervisor will soon be in charge of tracking ownership of all
> > > memory pages in the system. The current page-tracking infrastructure at
> > > EL2 only allows binary states: a page is either owned or not by an
> > > entity. But a number of use-cases will require more complex states for
> > > pages that are shared between two entities (host, hypervisor, or guests).
> > > 
> > > In preparation for supporting these use-cases, introduce in the KVM
> > > page-table library some infrastructure allowing to tag shared pages
> > > using ignored bits (a.k.a. software bits) in PTEs.
> > > 
> > > Signed-off-by: Quentin Perret 
> > > ---
> > >  arch/arm64/include/asm/kvm_pgtable.h |  5 +
> > >  arch/arm64/kvm/hyp/pgtable.c | 25 +
> > >  2 files changed, 30 insertions(+)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> > > b/arch/arm64/include/asm/kvm_pgtable.h
> > > index dd72653314c7..f6d3d5c8910d 100644
> > > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > > @@ -81,6 +81,8 @@ enum kvm_pgtable_stage2_flags {
> > >   * @KVM_PGTABLE_PROT_W:  Write permission.
> > >   * @KVM_PGTABLE_PROT_R:  Read permission.
> > >   * @KVM_PGTABLE_PROT_DEVICE: Device attributes.
> > > + * @KVM_PGTABLE_STATE_SHARED:Page shared with another entity.
> > > + * @KVM_PGTABLE_STATE_BORROWED:  Page borrowed from another entity.
> > >   */
> > >  enum kvm_pgtable_prot {
> > >   KVM_PGTABLE_PROT_X  = BIT(0),
> > > @@ -88,6 +90,9 @@ enum kvm_pgtable_prot {
> > >   KVM_PGTABLE_PROT_R  = BIT(2),
> > >  
> > >   KVM_PGTABLE_PROT_DEVICE = BIT(3),
> > > +
> > > + KVM_PGTABLE_STATE_SHARED= BIT(4),
> > > + KVM_PGTABLE_STATE_BORROWED  = BIT(5),
> > 
> > I'd rather have some indirection here, as we have other potential
> > users for the SW bits outside of pKVM (see the NV series, which uses
> > some of these SW bits as the backend for TTL-based TLB invalidation).
> > 
> > Can we instead only describe the SW bit states in this enum, and let
> > the users map the semantic they require onto that state? See [1] for
> > what I carry in the NV branch.
> 
> Works for me -- I just wanted to make sure we don't have users in
> different places that use the same bits without knowing, but no strong
> opinions, so happy to change.
> 
> > >  };
> > >  
> > >  #define KVM_PGTABLE_PROT_RW  (KVM_PGTABLE_PROT_R | 
> > > KVM_PGTABLE_PROT_W)
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index 5bdbe7a31551..51598b79dafc 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -211,6 +211,29 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 
> > > owner_id)
> > >   return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
> > >  }
> > >  
> > > +static kvm_pte_t pte_ignored_bit_prot(enum kvm_pgtable_prot prot)
> > 
> > Can we call these sw rather than ignored?
> 
> Sure.
> 
> > > +{
> > > + kvm_pte_t ignored_bits = 0;
> > > +
> > > + /*
> > > +  * Ignored bits 0 and 1 are reserved to track the memory ownership
> > > +  * state of each page:
> > > +  *   00: The page is owned solely by the page-table owner.
> > > +  *   01: The page is owned by the page-table owner, but is shared
> > > +  *   with another entity.
> > > +  *   10: The page is shared with, but not owned by the page-table owner.
> > > +  *   11: Reserved for future use (lending).
> > > +  */
> > > + if (prot & KVM_PGTABLE_STATE_SHARED) {
> > > + if (prot & KVM_PGTABLE_STATE_BORROWED)
> > > + ignored_bits |= BIT(1);
> > > + else
> > > + ignored_bits |= BIT(0);
> > > + }
> > > +
> > > + return FIELD_PREP(KVM_PTE_LEAF_ATTR_IGNORED, ignored_bits);
> > > +}
> > > +
> > >  static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, 
> > > u64 addr,
> > > u32 level, kvm_pte_t *ptep,
> > > enum kvm_pgtable_walk_flags flag)
> > > @@ -357,6 +380,7 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot 
> > > prot, kvm_pte_t *ptep)
> > >   attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> > >   attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
> > >   attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
> > > + attr |= pte_ignored_bit_prot(prot);
> > >   *ptep = attr;
> > >  
> > >   return 0;
> > > @@ -558,6 +582,7 @@ static int stage2_set_prot_attr(struct kvm_pgtable 
> > > *pgt, enum kvm_pgtable_prot p
> > >  
> > >   attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> > >   attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
> > > + attr |= pte_ignored_bit_prot(prot);
> > >   *ptep = attr;
> > >  
> > >   return 0;
> > 
> > How about 

Re: [PATCH 05/14] KVM: arm64: Don't overwrite ignored bits with owner id

2021-07-20 Thread Marc Zyngier
On Mon, 19 Jul 2021 14:39:05 +0100,
Quentin Perret  wrote:
> 
> On Monday 19 Jul 2021 at 13:55:29 (+0100), Marc Zyngier wrote:
> > On Mon, 19 Jul 2021 11:47:26 +0100,
> > Quentin Perret  wrote:
> > > 
> > > The nVHE protected mode uses invalid mappings in the host stage-2
> > > page-table to track the owner of each page in the system. In order to
> > > allow the usage of ignored bits (a.k.a. software bits) in these
> > > mappings, move the owner encoding away from the top bits.
> > 
> > But that's exactly what the current situation is allowing: the use of
> > the SW bits. I am guessing that what you really mean is that you want
> > to *preserve* the SW bits from an existing mapping and use other bits
> > when the mapping is invalid?
> 
> Yes, this is really just forward looking, but I think it might be useful
> to allow annotating invalid mappings with both an owner id _and_
> additional flags for e.g. shared pages and such. And using bits [58-55]
> to store those flags just like we do for valid mappings should make
> things easier, but no big deal.

Right, so maybe worth calling that out.

> I see how this is going to conflict with kvm_pgtable_stage2_annotate()
> from your series though, so maybe I should just drop this patch and
> leave the encoding 'issue' to the caller -- the rest of the series
> doesn't depend on this anyway, this was just small cleanup.

I'm not too worried about that for now. We can always rewrite one in
terms of the other, but I wanted to understand exactly this change was
about.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/14] KVM: arm64: Continue stage-2 map when re-creating mappings

2021-07-20 Thread Marc Zyngier
On Mon, 19 Jul 2021 14:32:10 +0100,
Quentin Perret  wrote:
> 
> On Monday 19 Jul 2021 at 13:14:48 (+0100), Marc Zyngier wrote:
> > On Mon, 19 Jul 2021 11:47:24 +0100,
> > Quentin Perret  wrote:
> > > 
> > > The stage-2 map walkers currently return -EAGAIN when re-creating
> > > identical mappings or only changing access permissions. This allows to
> > > optimize mapping pages for concurrent (v)CPUs faulting on the same
> > > page.
> > > 
> > > While this works as expected when touching one page-table leaf at a
> > > time, this can lead to difficult situations when mapping larger ranges.
> > > Indeed, a large map operation can fail in the middle if an existing
> > > mapping is found in the range, even if it has compatible attributes,
> > > hence leaving only half of the range mapped.
> > 
> > I'm curious of when this can happen. We normally map a single leaf at
> > a time, and we don't have a way to map multiple leaves at once: we
> > either use the VMA base size or try to upgrade it to a THP, but the
> > result is always a single leaf entry. What changed?
> 
> Nothing _yet_ :-)
> 
> The 'share' hypercall introduced near the end of the series allows to
> share multiple physically contiguous pages in one go -- this is mostly
> to allow sharing data-structures that are larger than a page.
> 
> So if one of the pages happens to be already mapped by the time the
> hypercall is issued, mapping the range with the right SW bits becomes
> difficult as kvm_pgtable_stage2_map() will fail halfway through, which
> is tricky to handle.
> 
> This patch shouldn't change anything for existing users that only map
> things that are nicely aligned at block/page granularity, but should
> make the life of new users easier, so that seemed like a win.

Right, but this is on a different path, right? Guests can never fault
multiple mappings at once, and it takes you a host hypercall to
perform this "multiple leaves at once".

Is there any way we can restrict this to the hypercall? Or even
better, keep the hypercall as a "one page at a time" thing? I can't
imagine it being performance critical (it is a once-off, and only used
over a rather small region of memory). Then, the called doesn't have
to worry about the page already being mapped or not. This would also
match the behaviour of what I do on the MMIO side.

Or do you anticipate much gain from this being able to use block
mappings?

> 
> > > To avoid having to deal with such failures in the caller, don't
> > > interrupt the map operation when hitting existing PTEs, but make sure to
> > > still return -EAGAIN so that user_mem_abort() can mark the page dirty
> > > when needed.
> > 
> > I don't follow you here: if you return -EAGAIN for a writable mapping,
> > we don't account for the page to be dirty on the assumption that
> > nothing has been mapped. But if there is a way to map more than a
> > single entry and to get -EAGAIN at the same time, then we're bound to
> > lose data on page eviction.
> > 
> > Can you shed some light on this?
> 
> Sure. For guests, hitting the -EAGAIN case means we've lost the race
> with another vCPU that faulted the same page. In this case the other
> vCPU either mapped the page RO, which means that our vCPU will then get
> a permission fault next time we run it which will lead to the page being
> marked dirty, or the other vCPU mapped the page RW in which case it
> already marked the page dirty for us and we can safely re-enter the
> guest without doing anything else.
> 
> So what I meant by "still return -EAGAIN so that user_mem_abort() can
> mark the page dirty when needed" is "make sure to mark the page dirty
> only when necessary: if winning the race and marking the page RW, or
> in the permission fault path". That is, by keeping the -EAGAIN I want to
> make sure we don't mark the page dirty twice. (This might fine, but this
> would be new behaviour, and it was not clear that would scale well to
> many vCPUs faulting the same page).
> 
> I see how this wording can be highly confusing though, I'll and re-word
> for the next version.

I indeed found it pretty confusing. A reword would be much appreciated.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid

2021-07-20 Thread Marc Zyngier
On Mon, 19 Jul 2021 18:18:02 +0100,
Quentin Perret  wrote:
> 
> On Thursday 15 Jul 2021 at 17:31:45 (+0100), Marc Zyngier wrote:
> > Make sure we don't issue CMOs when mapping something that
> > is not a memory address in the S2 page tables.
> > 
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 16 ++--
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 05321f4165e3..a5874ebd0354 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -619,12 +619,16 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 
> > end, u32 level,
> > }
> >  
> > /* Perform CMOs before installation of the guest stage-2 PTE */
> > -   if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> > -   mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
> > -   granule);
> > -
> > -   if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> > -   mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> > +   if (kvm_phys_is_valid(phys)) {
> > +   if (mm_ops->dcache_clean_inval_poc &&
> > +   stage2_pte_cacheable(pgt, new))
> > +   mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new,
> > + mm_ops),
> > +  granule);
> > +   if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> > +   mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops),
> > +granule);
> > +   }
> 
> Hrmpf so this makes me realize we have a problem here, not really caused
> by your patch though.
> 
> Specifically, calling kvm_pgtable_stage2_set_owner() can lead to
> overriding valid mappings with invalid mappings, which is effectively an
> unmap operation. In this case we should issue CMOs when unmapping a
> cacheable page to ensure it is clean to the PoC, like the
> kvm_pgtable_stage2_unmap() does.

You only need that if you to have a non-cacheable mapping to the same
page. Otherwise, you will be fine.

For pages that are moved from host-EL1 to EL2, we're good (I don't
think we ever have a non-cachable mapping at EL2). However, once we
start moving pages to guests, we'll need something.

> Note that you patch is already an improvement over the current state of
> things, because calling stage2_pte_cacheable(pgt, new),
> kvm_pte_follow(new, mm_ops) and friends is bogus when 'new' is invalid

Yeah, I had it mentally earmarked as a potential stable candidate. We
may have to do a bit better in the light of the above though.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm