Re: [PATCH v3 00/66] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support

2021-02-23 Thread Haibo Xu
On Thu, 18 Feb 2021 at 06:10, Marc Zyngier  wrote:
>
> On Thu, 04 Feb 2021 07:51:37 +0000,
> Haibo Xu  wrote:
> >
> > Kindly ping!
> >
> > On Thu, 21 Jan 2021 at 11:03, Haibo Xu  wrote:
> > >
> > > Re-send in case the previous email was blocked for the inlined hyper-link.
> > >
> > > Hi Marc,
> > >
> > > I have tried to enable the NV support in Qemu, and now I can
> > > successfully boot a L2 guest
> > > in Qemu KVM mode.
> > >
> > > This patch series looks good from the Qemu side except for two minor
> > > requirements:
> > > (1) Qemu will check whether a feature was supported by the KVM cap
> > > when the user tries to enable it in the command line, so a new
> > > capability was prefered for the NV(KVM_CAP_ARM_NV?).
>
> I have added KVM_CAP_ARM_EL2 (rather than NV) to that effect.
>
> > > (2) According to the Documentation/virt/kvm/api.rst, userspace can
> > > call KVM_ARM_VCPU_INIT multiple times for a given vcpu, but the
> > > kvm_vcpu_init_nested() do have some issue when called multiple
> > > times(please refer to the detailed comments in patch 63)
>
> This is now fixed, I believe.
>
> I have pushed out a branch [1] that addresses all the reported
> issues, though it currently lack some testing. Please let me know if
> it works for you.
>

Hi Marc,

I have verified the fix, and it works well with Qemu.

thanks,
Haibo

> Thanks,
>
> M.
>
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-5.12-WIP
>
> --
> Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 00/66] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support

2021-02-03 Thread Haibo Xu
Kindly ping!

On Thu, 21 Jan 2021 at 11:03, Haibo Xu  wrote:
>
> Re-send in case the previous email was blocked for the inlined hyper-link.
>
> Hi Marc,
>
> I have tried to enable the NV support in Qemu, and now I can
> successfully boot a L2 guest
> in Qemu KVM mode.
>
> This patch series looks good from the Qemu side except for two minor
> requirements:
> (1) Qemu will check whether a feature was supported by the KVM cap
> when the user tries
>  to enable it in the command line, so a new capability was
> prefered for the NV(KVM_CAP_ARM_NV?).
> (2) According to the Documentation/virt/kvm/api.rst, userspace can
> call KVM_ARM_VCPU_INIT
>  multiple times for a given vcpu, but the kvm_vcpu_init_nested()
> do have some issue when
>  called multiple times(please refer to the detailed comments in patch 63)
>
> Regards,
> Haibo
>
> On Fri, 11 Dec 2020 at 00:00, Marc Zyngier  wrote:
> >
> > This is a rework of the NV series that I posted 10 months ago[1], as a
> > lot of the KVM code has changed since, and the series apply anymore
> > (not that anybody really cares as the the HW is, as usual, made of
> > unobtainium...).
> >
> > From the previous version:
> >
> > - Integration with the new page-table code
> > - New exception injection code
> > - No more messing with the nVHE code
> > - No AArch32
> > - Rebased on v5.10-rc4 + kvmarm/next for 5.11
> >
> > From a functionality perspective, you can expect a L2 guest to work,
> > but don't even think of L3, as we only partially emulate the
> > ARMv8.{3,4}-NV extensions themselves. Same thing for vgic, debug, PMU,
> > as well as anything that would require a Stage-1 PTW. What we want to
> > achieve is that with NV disabled, there is no performance overhead and
> > no regression.
> >
> > The series is roughly divided in 5 parts: exception handling, memory
> > virtualization, interrupts and timers for ARMv8.3, followed by the
> > ARMv8.4 support. There are of course some dependencies, but you'll
> > hopefully get the gist of it.
> >
> > For the most courageous of you, I've put out a branch[2]. Of course,
> > you'll need some userspace. Andre maintains a hacked version of
> > kvmtool[3] that takes a --nested option, allowing the guest to be
> > started at EL2. You can run the whole stack in the Foundation
> > model. Don't be in a hurry ;-).
> >
> > And to be clear: although Jintack and Christoffer have written tons of
> > the stuff originaly, I'm the one responsible for breaking it!
> >
> > [1] https://lore.kernel.org/r/20200211174938.27809-1-...@kernel.org
> > [2] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git 
> > kvm-arm64/nv-5.11.-WIP
> > [3] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5
> >
> > Andre Przywara (1):
> >   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> >
> > Christoffer Dall (15):
> >   KVM: arm64: nv: Introduce nested virtualization VCPU feature
> >   KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
> >   KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
> >   KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
> >   KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
> >   KVM: arm64: nv: Handle trapped ERET from virtual EL2
> >   KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
> >   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
> >   KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
> > changes
> >   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
> >   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
> >   KVM: arm64: nv: arch_timer: Support hyp timer emulation
> >   KVM: arm64: nv: vgic: Emulate the HW bit in software
> >   KVM: arm64: nv: Add nested GICv3 tracepoints
> >   KVM: arm64: nv: Sync nested timer state with ARMv8.4
> >
> > Jintack Lim (19):
> >   arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
> >   KVM: arm64: nv: Handle HCR_EL2.NV system register traps
> >   KVM: arm64: nv: Support virtual EL2 exceptions
> >   KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
> >   KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
> >   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
> >   KVM: arm64: nv: Handle PSCI call via smc from the guest
> >   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
> >   KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
> >   KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
> >   KVM: arm64

Re: [PATCH v3 00/66] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support

2021-01-20 Thread Haibo Xu
Re-send in case the previous email was blocked for the inlined hyper-link.

Hi Marc,

I have tried to enable the NV support in Qemu, and now I can
successfully boot a L2 guest
in Qemu KVM mode.

This patch series looks good from the Qemu side except for two minor
requirements:
(1) Qemu will check whether a feature was supported by the KVM cap
when the user tries
 to enable it in the command line, so a new capability was
prefered for the NV(KVM_CAP_ARM_NV?).
(2) According to the Documentation/virt/kvm/api.rst, userspace can
call KVM_ARM_VCPU_INIT
 multiple times for a given vcpu, but the kvm_vcpu_init_nested()
do have some issue when
 called multiple times(please refer to the detailed comments in patch 63)

Regards,
Haibo

On Fri, 11 Dec 2020 at 00:00, Marc Zyngier  wrote:
>
> This is a rework of the NV series that I posted 10 months ago[1], as a
> lot of the KVM code has changed since, and the series apply anymore
> (not that anybody really cares as the the HW is, as usual, made of
> unobtainium...).
>
> From the previous version:
>
> - Integration with the new page-table code
> - New exception injection code
> - No more messing with the nVHE code
> - No AArch32
> - Rebased on v5.10-rc4 + kvmarm/next for 5.11
>
> From a functionality perspective, you can expect a L2 guest to work,
> but don't even think of L3, as we only partially emulate the
> ARMv8.{3,4}-NV extensions themselves. Same thing for vgic, debug, PMU,
> as well as anything that would require a Stage-1 PTW. What we want to
> achieve is that with NV disabled, there is no performance overhead and
> no regression.
>
> The series is roughly divided in 5 parts: exception handling, memory
> virtualization, interrupts and timers for ARMv8.3, followed by the
> ARMv8.4 support. There are of course some dependencies, but you'll
> hopefully get the gist of it.
>
> For the most courageous of you, I've put out a branch[2]. Of course,
> you'll need some userspace. Andre maintains a hacked version of
> kvmtool[3] that takes a --nested option, allowing the guest to be
> started at EL2. You can run the whole stack in the Foundation
> model. Don't be in a hurry ;-).
>
> And to be clear: although Jintack and Christoffer have written tons of
> the stuff originaly, I'm the one responsible for breaking it!
>
> [1] https://lore.kernel.org/r/20200211174938.27809-1-...@kernel.org
> [2] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git 
> kvm-arm64/nv-5.11.-WIP
> [3] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5
>
> Andre Przywara (1):
>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
>
> Christoffer Dall (15):
>   KVM: arm64: nv: Introduce nested virtualization VCPU feature
>   KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
>   KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
>   KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
>   KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
>   KVM: arm64: nv: Handle trapped ERET from virtual EL2
>   KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
>   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
>   KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
> changes
>   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
>   KVM: arm64: nv: arch_timer: Support hyp timer emulation
>   KVM: arm64: nv: vgic: Emulate the HW bit in software
>   KVM: arm64: nv: Add nested GICv3 tracepoints
>   KVM: arm64: nv: Sync nested timer state with ARMv8.4
>
> Jintack Lim (19):
>   arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
>   KVM: arm64: nv: Handle HCR_EL2.NV system register traps
>   KVM: arm64: nv: Support virtual EL2 exceptions
>   KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
>   KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
>   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
>   KVM: arm64: nv: Handle PSCI call via smc from the guest
>   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>   KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
>   KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
>   KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
>   KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
>   KVM: arm64: nv: Introduce sys_reg_desc.forward_trap
>   KVM: arm64: nv: Set a handler for the system instruction traps
>   KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>   KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
>   KVM: arm64: nv: Nested GICv3 Support
>
> Marc Zyngier (31):
>   KVM: arm64: nv: Add EL2 system registers to vcpu context
>   KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
>   KVM: arm64: nv: Handle virtual EL2 registers in
> 

Re: [PATCH v3 33/66] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures

2021-01-20 Thread Haibo Xu
On Fri, 11 Dec 2020 at 00:04, Marc Zyngier  wrote:
>
> Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow Stage-2 page tables, but we now have a
> framework for getting to a shadow Stage-2 pgd.
>
> We allocate twice the number of vcpus as Stage-2 mmu structures because
> that's sufficient for each vcpu running two translation regimes without
> having to flush the Stage-2 page tables.
>
> Co-developed-by: Christoffer Dall 
> Signed-off-by: Christoffer Dall 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/kvm_host.h   |  29 +
>  arch/arm64/include/asm/kvm_mmu.h|   8 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 ++
>  arch/arm64/kvm/arm.c|  16 ++-
>  arch/arm64/kvm/mmu.c|  18 ++-
>  arch/arm64/kvm/nested.c | 183 
>  6 files changed, 250 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index d731cf7a56cb..d99e51e7cbee 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -95,14 +95,43 @@ struct kvm_s2_mmu {
> int __percpu *last_vcpu_ran;
>
> struct kvm *kvm;
> +
> +   /*
> +* For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +* hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +* contains no valid information.
> +*/
> +   u64 vttbr;
> +
> +   /* true when this represents a nested context where virtual 
> HCR_EL2.VM == 1 */
> +   boolnested_stage2_enabled;
> +
> +   /*
> +*  0: Nobody is currently using this, check vttbr for validity
> +* >0: Somebody is actively using this.
> +*/
> +   atomic_t refcnt;
>  };
>
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +   return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch_memory_slot {
>  };
>
>  struct kvm_arch {
> struct kvm_s2_mmu mmu;
>
> +   /*
> +* Stage 2 paging stage for VMs with nested virtual using a virtual
> +* VMID.
> +*/
> +   struct kvm_s2_mmu *nested_mmus;
> +   size_t nested_mmus_size;
> +   int nested_mmus_next;
> +
> /* VTCR_EL2 value for this VM */
> u64vtcr;
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 76a8a0ca45b8..ec39015bb2a6 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -126,6 +126,7 @@ alternative_cb_end
>  #include 
>  #include 
>  #include 
> +#include 
>
>  void kvm_update_va_mask(struct alt_instr *alt,
> __le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -184,6 +185,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, 
> size_t size,
>  void **haddr);
>  void free_hyp_pgds(void);
>
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 
> size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -306,5 +308,11 @@ static __always_inline void __load_guest_stage2(struct 
> kvm_s2_mmu *mmu)
> asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
>  }
>
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +   return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +   VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h 
> b/arch/arm64/include/asm/kvm_nested.h
> index 026ddaad972c..473ecd1d60d0 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -61,6 +61,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 
> cnthctl)
> (cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | 
> CNTHCTL_EVNTEN)));
>  }
>
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
> u64 control_bit);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 6e637d2b4cfb..1656dd80bbc4 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -142,6 +143,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> if (ret)
> return ret;
>
> +   kvm_init_nested(kvm);
> +
>  

Re: [PATCH v3 63/66] KVM: arm64: nv: Allocate VNCR page when required

2021-01-20 Thread Haibo Xu
On Fri, 11 Dec 2020 at 00:04, Marc Zyngier  wrote:
>
> If running a NV guest on an ARMv8.4-NV capable system, let's
> allocate an additional page that will be used by the hypervisor
> to fulfill system register accesses.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/kvm_host.h | 3 ++-
>  arch/arm64/kvm/nested.c   | 8 
>  arch/arm64/kvm/reset.c| 1 +
>  3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 78630bd5124d..dada0678c28e 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -523,7 +523,8 @@ struct kvm_vcpu_arch {
>   */
>  static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
>  {
> -   if (unlikely(r >= __VNCR_START__ && ctxt->vncr_array))
> +   if (unlikely(cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
> +r >= __VNCR_START__ && ctxt->vncr_array))
> return >vncr_array[r - __VNCR_START__];
>
> return (u64 *)>sys_regs[r];
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index eef8f9873814..88147ec99755 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -47,6 +47,12 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> return -EINVAL;
>
> +   if (cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT)) {
> +   vcpu->arch.ctxt.vncr_array = (u64 
> *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> +   if (!vcpu->arch.ctxt.vncr_array)
> +   return -ENOMEM;
> +   }
> +

If KVM_ARM_VCPU_INIT was called multiple times, the above codes would
try to allocate a new page
without free-ing the previous one. Besides that, the following
kvm_free_stage2_pgd() call would fail in the
second call with the error message "kvm_arch already initialized?".
I think a possible fix is to add a new flag to indicate whether the NV
related meta data have been initialized,
and only initialize them for the first call.

> mutex_lock(>lock);
>
> /*
> @@ -64,6 +70,8 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> kvm_init_stage2_mmu(kvm, [num_mmus - 2])) {
> kvm_free_stage2_pgd([num_mmus - 1]);
> kvm_free_stage2_pgd([num_mmus - 2]);
> +   free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
> +   vcpu->arch.ctxt.vncr_array = NULL;
> } else {
> kvm->arch.nested_mmus_size = num_mmus;
> ret = 0;
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 2d2c780e6c69..d281eb39036f 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -150,6 +150,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
>  void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
> kfree(vcpu->arch.sve_state);
> +   free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
>  }
>
>  static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
> --
> 2.29.2
>
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 00/66] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support

2021-01-20 Thread Haibo Xu
On Fri, 11 Dec 2020 at 00:00, Marc Zyngier  wrote:
>
> This is a rework of the NV series that I posted 10 months ago[1], as a
> lot of the KVM code has changed since, and the series apply anymore
> (not that anybody really cares as the the HW is, as usual, made of
> unobtainium...).
>
> From the previous version:
>
> - Integration with the new page-table code
> - New exception injection code
> - No more messing with the nVHE code
> - No AArch32
> - Rebased on v5.10-rc4 + kvmarm/next for 5.11
>
> From a functionality perspective, you can expect a L2 guest to work,
> but don't even think of L3, as we only partially emulate the
> ARMv8.{3,4}-NV extensions themselves. Same thing for vgic, debug, PMU,
> as well as anything that would require a Stage-1 PTW. What we want to
> achieve is that with NV disabled, there is no performance overhead and
> no regression.
>
> The series is roughly divided in 5 parts: exception handling, memory
> virtualization, interrupts and timers for ARMv8.3, followed by the
> ARMv8.4 support. There are of course some dependencies, but you'll
> hopefully get the gist of it.
>
> For the most courageous of you, I've put out a branch[2]. Of course,
> you'll need some userspace. Andre maintains a hacked version of
> kvmtool[3] that takes a --nested option, allowing the guest to be
> started at EL2. You can run the whole stack in the Foundation
> model. Don't be in a hurry ;-).
>
> And to be clear: although Jintack and Christoffer have written tons of
> the stuff originaly, I'm the one responsible for breaking it!
>
> [1] https://lore.kernel.org/r/20200211174938.27809-1-...@kernel.org
> [2] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
kvm-arm64/nv-5.11.-WIP
> [3] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5

Hi Marc,

I have tried to enable the NV support in Qemu, and now I can successfully
boot a L2 guest
in Qemu KVM mode.

This patch series looks good from the Qemu side except for two minor
requirements:
(1) Qemu will check whether a feature was supported by the KVM cap when the
user tries
 to enable it in the command line, so a new capability was prefered for
the NV(KVM_CAP_ARM_NV?).
(2) According to the Documentation/virt/kvm/api.rst
,
userspace can call KVM_ARM_VCPU_INIT
 multiple times for a given vcpu, but the kvm_vcpu_init_nested() do
have some issue when
 called multiple times(please refer to the detailed comments in patch
63)

Regards,
Haibo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 00/66] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support

2021-01-12 Thread Haibo Xu
On Mon, 11 Jan 2021 at 16:59, Marc Zyngier  wrote:
>
> Hi Haibo,
>
> On 2021-01-11 07:20, Haibo Xu wrote:
> > On Fri, 11 Dec 2020 at 00:00, Marc Zyngier  wrote:
> >>
> >> This is a rework of the NV series that I posted 10 months ago[1], as a
> >> lot of the KVM code has changed since, and the series apply anymore
> >> (not that anybody really cares as the the HW is, as usual, made of
> >> unobtainium...).
> >>
> >> From the previous version:
> >>
> >> - Integration with the new page-table code
> >> - New exception injection code
> >> - No more messing with the nVHE code
> >> - No AArch32
> >> - Rebased on v5.10-rc4 + kvmarm/next for 5.11
> >>
> >> From a functionality perspective, you can expect a L2 guest to work,
> >> but don't even think of L3, as we only partially emulate the
> >> ARMv8.{3,4}-NV extensions themselves. Same thing for vgic, debug, PMU,
> >> as well as anything that would require a Stage-1 PTW. What we want to
> >> achieve is that with NV disabled, there is no performance overhead and
> >> no regression.
> >>
> >> The series is roughly divided in 5 parts: exception handling, memory
> >> virtualization, interrupts and timers for ARMv8.3, followed by the
> >> ARMv8.4 support. There are of course some dependencies, but you'll
> >> hopefully get the gist of it.
> >>
> >> For the most courageous of you, I've put out a branch[2]. Of course,
> >> you'll need some userspace. Andre maintains a hacked version of
> >> kvmtool[3] that takes a --nested option, allowing the guest to be
> >> started at EL2. You can run the whole stack in the Foundation
> >> model. Don't be in a hurry ;-).
> >>
> >
> > Hi Marc,
> >
> > I got a kernel BUG message when booting the L2 guest kernel with the
> > kvmtool on a FVP setup.
> > Could you help have a look about the BUG message as well as my
> > environment configuration?
> > I think It probably caused by some local configurations of the FVP
> > setup.
>
> No, this is likely a bug in your L1 guest, which was fixed in -rc3:
>
> 2a5f1b67ec57 ("KVM: arm64: Don't access PMCR_EL0 when no PMU is
> available")
>
> and was found in the exact same circumstances. Alternatively, and if
> you don't want to change your L1 guest, you can just pass the --pmu
> option to kvmtool when starting the L1 guest.

After passing --pmu when starting a L1 guest, I can successfully run a
L2 guest now!
Thanks so much for the help!

Haibo

>
> Hope this helps,
>
>  M.
> --
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 0/2] MTE support for KVM guest

2020-12-16 Thread Haibo Xu
On Wed, 16 Dec 2020 at 18:23, Steven Price  wrote:
>
> On 16/12/2020 07:31, Haibo Xu wrote:
> [...]
> > Hi Steve,
>
> Hi Haibo
>
> > I have finished verifying the POC on a FVP setup, and the MTE test case can
> > be migrated from one VM to another successfully. Since the test case is very
> > simple which just maps one page with MTE enabled and does some memory
> > access, so I can't say it's OK for other cases.
>
> That's great progress.
>
> >
> > BTW, I noticed that you have sent out patch set v6 which mentions that 
> > mapping
> > all the guest memory with PROT_MTE was not feasible. So what's the plan for 
> > the
> > next step? Will new KVM APIs which can facilitate the tag store and recover 
> > be
> > available?
>
> I'm currently rebasing on top of the KASAN MTE patch series. My plan for
> now is to switch back to not requiring the VMM to supply PROT_MTE (so
> KVM 'upgrades' the pages as necessary) and I'll add an RFC patch on the
> end of the series to add an KVM API for doing bulk read/write of tags.
> That way the VMM can map guest memory without PROT_MTE (so device 'DMA'
> accesses will be unchecked), and use the new API for migration.
>

Great! Will have a try with the new API in my POC!

> Thanks,
>
> Steve
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 0/2] MTE support for KVM guest

2020-12-15 Thread Haibo Xu
On Mon, 7 Dec 2020 at 22:48, Steven Price  wrote:
>
> On 04/12/2020 08:25, Haibo Xu wrote:
> > On Fri, 20 Nov 2020 at 17:51, Steven Price  wrote:
> >>
> >> On 19/11/2020 19:11, Marc Zyngier wrote:
> >>> On 2020-11-19 18:42, Andrew Jones wrote:
> >>>> On Thu, Nov 19, 2020 at 03:45:40PM +, Peter Maydell wrote:
> >>>>> On Thu, 19 Nov 2020 at 15:39, Steven Price  wrote:
> >>>>>> This series adds support for Arm's Memory Tagging Extension (MTE) to
> >>>>>> KVM, allowing KVM guests to make use of it. This builds on the
> >>>>> existing
> >>>>>> user space support already in v5.10-rc1, see [1] for an overview.
> >>>>>
> >>>>>> The change to require the VMM to map all guest memory PROT_MTE is
> >>>>>> significant as it means that the VMM has to deal with the MTE tags
> >>>>> even
> >>>>>> if it doesn't care about them (e.g. for virtual devices or if the VMM
> >>>>>> doesn't support migration). Also unfortunately because the VMM can
> >>>>>> change the memory layout at any time the check for PROT_MTE/VM_MTE has
> >>>>>> to be done very late (at the point of faulting pages into stage 2).
> >>>>>
> >>>>> I'm a bit dubious about requring the VMM to map the guest memory
> >>>>> PROT_MTE unless somebody's done at least a sketch of the design
> >>>>> for how this would work on the QEMU side. Currently QEMU just
> >>>>> assumes the guest memory is guest memory and it can access it
> >>>>> without special precautions...
> >>>>>
> >>>>
> >>>> There are two statements being made here:
> >>>>
> >>>> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit
> >>>> QEMU well.
> >>>>
> >>>> 2) New KVM features should be accompanied with supporting QEMU code in
> >>>> order to prove that the APIs make sense.
> >>>>
> >>>> I strongly agree with (2). While kvmtool supports some quick testing, it
> >>>> doesn't support migration. We must test all new features with a migration
> >>>> supporting VMM.
> >>>>
> >>>> I'm not sure about (1). I don't feel like it should be a major problem,
> >>>> but (2).
> >>
> >> (1) seems to be contentious whichever way we go. Either PROT_MTE isn't
> >> required in which case it's easy to accidentally screw up migration, or
> >> it is required in which case it's difficult to handle normal guest
> >> memory from the VMM. I get the impression that probably I should go back
> >> to the previous approach - sorry for the distraction with this change.
> >>
> >> (2) isn't something I'm trying to skip, but I'm limited in what I can do
> >> myself so would appreciate help here. Haibo is looking into this.
> >>
> >
> > Hi Steven,
> >
> > Sorry for the later reply!
> >
> > I have finished the POC for the MTE migration support with the assumption
> > that all the memory is mapped with PROT_MTE. But I got stuck in the test
> > with a FVP setup. Previously, I successfully compiled a test case to verify
> > the basic function of MTE in a guest. But these days, the re-compiled test
> > can't be executed by the guest(very weird). The short plan to verify
> > the migration
> > is to set the MTE tags on one page in the guest, and try to dump the 
> > migrated
> > memory contents.
>
> Hi Haibo,
>
> Sounds like you are making good progress - thanks for the update. Have
> you thought about how the PROT_MTE mappings might work if QEMU itself
> were to use MTE? My worry is that we end up with MTE in a guest
> preventing QEMU from using MTE itself (because of the PROT_MTE
> mappings). I'm hoping QEMU can wrap its use of guest memory in a
> sequence which disables tag checking (something similar will be needed
> for the "protected VM" use case anyway), but this isn't something I've
> looked into.
>
> > I will update the status later next week!
>
> Great, I look forward to hearing how it goes.

Hi Steve,

I have finished verifying the POC on a FVP setup, and the MTE test case can
be migrated from one VM to another successfully. Since the test case is very
simple which just maps one page with MTE enabled and does some memory
access, so I can't say it's OK for other cases.

BTW, I noticed that you have sent out patch set v6 which mentions that mapping
all the guest memory with PROT_MTE was not feasible. So what's the plan for the
next step? Will new KVM APIs which can facilitate the tag store and recover be
available?

Regards,
Haibo

>
> Thanks,
>
> Steve
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 0/2] MTE support for KVM guest

2020-12-08 Thread Haibo Xu
On Tue, 8 Dec 2020 at 18:01, Marc Zyngier  wrote:
>
> On 2020-12-08 09:51, Haibo Xu wrote:
> > On Mon, 7 Dec 2020 at 22:48, Steven Price  wrote:
> >>
>
> [...]
>
> >> Sounds like you are making good progress - thanks for the update. Have
> >> you thought about how the PROT_MTE mappings might work if QEMU itself
> >> were to use MTE? My worry is that we end up with MTE in a guest
> >> preventing QEMU from using MTE itself (because of the PROT_MTE
> >> mappings). I'm hoping QEMU can wrap its use of guest memory in a
> >> sequence which disables tag checking (something similar will be needed
> >> for the "protected VM" use case anyway), but this isn't something I've
> >> looked into.
> >
> > As far as I can see, to map all the guest memory with PROT_MTE in VMM
> > is a little weird, and lots of APIs have to be changed to include this
> > flag.
> > IMHO, it would be better if the KVM can provide new APIs to load/store
> > the
> > guest memory tag which may make it easier to enable the Qemu migration
> > support.
>
> On what granularity? To what storage? How do you plan to synchronise
> this
> with the dirty-log interface?

The Qemu would migrate page by page, and if one page has been migrated but
becomes dirty again, the migration process would re-send this dirty page.
The current MTE migration POC codes would try to send the page tags just after
the page data, if one page becomes dirty again, the page data and the tags would
be re-sent.

>
> Thanks,
>
>  M.
> --
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 0/2] MTE support for KVM guest

2020-12-08 Thread Haibo Xu
On Tue, 8 Dec 2020 at 00:44, Dr. David Alan Gilbert  wrote:
>
> * Steven Price (steven.pr...@arm.com) wrote:
> > On 07/12/2020 15:27, Peter Maydell wrote:
> > > On Mon, 7 Dec 2020 at 14:48, Steven Price  wrote:
> > > > Sounds like you are making good progress - thanks for the update. Have
> > > > you thought about how the PROT_MTE mappings might work if QEMU itself
> > > > were to use MTE? My worry is that we end up with MTE in a guest
> > > > preventing QEMU from using MTE itself (because of the PROT_MTE
> > > > mappings). I'm hoping QEMU can wrap its use of guest memory in a
> > > > sequence which disables tag checking (something similar will be needed
> > > > for the "protected VM" use case anyway), but this isn't something I've
> > > > looked into.
> > >
> > > It's not entirely the same as the "protected VM" case. For that
> > > the patches currently on list basically special case "this is a
> > > debug access (eg from gdbstub/monitor)" which then either gets
> > > to go via "decrypt guest RAM for debug" or gets failed depending
> > > on whether the VM has a debug-is-ok flag enabled. For an MTE
> > > guest the common case will be guests doing standard DMA operations
> > > to or from guest memory. The ideal API for that from QEMU's
> > > point of view would be "accesses to guest RAM don't do tag
> > > checks, even if tag checks are enabled for accesses QEMU does to
> > > memory it has allocated itself as a normal userspace program".
> >
> > Sorry, I know I simplified it rather by saying it's similar to protected VM.
> > Basically as I see it there are three types of memory access:
> >
> > 1) Debug case - has to go via a special case for decryption or ignoring the
> > MTE tag value. Hopefully this can be abstracted in the same way.
> >
> > 2) Migration - for a protected VM there's likely to be a special method to
> > allow the VMM access to the encrypted memory (AFAIK memory is usually kept
> > inaccessible to the VMM). For MTE this again has to be special cased as we
> > actually want both the data and the tag values.
> >
> > 3) Device DMA - for a protected VM it's usual to unencrypt a small area of
> > memory (with the permission of the guest) and use that as a bounce buffer.
> > This is possible with MTE: have an area the VMM purposefully maps with
> > PROT_MTE. The issue is that this has a performance overhead and we can do
> > better with MTE because it's trivial for the VMM to disable the protection
> > for any memory.
>
> Those all sound very similar to the AMD SEV world;  there's the special
> case for Debug that Peter mentioned; migration is ...complicated and
> needs special case that's still being figured out, and as I understand
> Device DMA also uses a bounce buffer (and swiotlb in the guest to make
> that happen).
>
>
> I'm not sure about the stories for the IBM hardware equivalents.

Like s390-skeys(storage keys) support in Qemu?

I have read the migration support for the s390-skeys in Qemu and found
that the logic is very similar to that of MTE, except the difference that the
s390-skeys were migrated separately from that of the guest memory data
while for MTE, I think the guest memory tags should go with the  memory data.

>
> Dave
>
> > The part I'm unsure on is how easy it is for QEMU to deal with (3) without
> > the overhead of bounce buffers. Ideally there'd already be a wrapper for
> > guest memory accesses and that could just be wrapped with setting TCO during
> > the access. I suspect the actual situation is more complex though, and I'm
> > hoping Haibo's investigations will help us understand this.
> >
> > Thanks,
> >
> > Steve
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 0/2] MTE support for KVM guest

2020-12-08 Thread Haibo Xu
On Mon, 7 Dec 2020 at 22:48, Steven Price  wrote:
>
> On 04/12/2020 08:25, Haibo Xu wrote:
> > On Fri, 20 Nov 2020 at 17:51, Steven Price  wrote:
> >>
> >> On 19/11/2020 19:11, Marc Zyngier wrote:
> >>> On 2020-11-19 18:42, Andrew Jones wrote:
> >>>> On Thu, Nov 19, 2020 at 03:45:40PM +, Peter Maydell wrote:
> >>>>> On Thu, 19 Nov 2020 at 15:39, Steven Price  wrote:
> >>>>>> This series adds support for Arm's Memory Tagging Extension (MTE) to
> >>>>>> KVM, allowing KVM guests to make use of it. This builds on the
> >>>>> existing
> >>>>>> user space support already in v5.10-rc1, see [1] for an overview.
> >>>>>
> >>>>>> The change to require the VMM to map all guest memory PROT_MTE is
> >>>>>> significant as it means that the VMM has to deal with the MTE tags
> >>>>> even
> >>>>>> if it doesn't care about them (e.g. for virtual devices or if the VMM
> >>>>>> doesn't support migration). Also unfortunately because the VMM can
> >>>>>> change the memory layout at any time the check for PROT_MTE/VM_MTE has
> >>>>>> to be done very late (at the point of faulting pages into stage 2).
> >>>>>
> >>>>> I'm a bit dubious about requring the VMM to map the guest memory
> >>>>> PROT_MTE unless somebody's done at least a sketch of the design
> >>>>> for how this would work on the QEMU side. Currently QEMU just
> >>>>> assumes the guest memory is guest memory and it can access it
> >>>>> without special precautions...
> >>>>>
> >>>>
> >>>> There are two statements being made here:
> >>>>
> >>>> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit
> >>>> QEMU well.
> >>>>
> >>>> 2) New KVM features should be accompanied with supporting QEMU code in
> >>>> order to prove that the APIs make sense.
> >>>>
> >>>> I strongly agree with (2). While kvmtool supports some quick testing, it
> >>>> doesn't support migration. We must test all new features with a migration
> >>>> supporting VMM.
> >>>>
> >>>> I'm not sure about (1). I don't feel like it should be a major problem,
> >>>> but (2).
> >>
> >> (1) seems to be contentious whichever way we go. Either PROT_MTE isn't
> >> required in which case it's easy to accidentally screw up migration, or
> >> it is required in which case it's difficult to handle normal guest
> >> memory from the VMM. I get the impression that probably I should go back
> >> to the previous approach - sorry for the distraction with this change.
> >>
> >> (2) isn't something I'm trying to skip, but I'm limited in what I can do
> >> myself so would appreciate help here. Haibo is looking into this.
> >>
> >
> > Hi Steven,
> >
> > Sorry for the later reply!
> >
> > I have finished the POC for the MTE migration support with the assumption
> > that all the memory is mapped with PROT_MTE. But I got stuck in the test
> > with a FVP setup. Previously, I successfully compiled a test case to verify
> > the basic function of MTE in a guest. But these days, the re-compiled test
> > can't be executed by the guest(very weird). The short plan to verify
> > the migration
> > is to set the MTE tags on one page in the guest, and try to dump the 
> > migrated
> > memory contents.
>
> Hi Haibo,
>
> Sounds like you are making good progress - thanks for the update. Have
> you thought about how the PROT_MTE mappings might work if QEMU itself
> were to use MTE? My worry is that we end up with MTE in a guest
> preventing QEMU from using MTE itself (because of the PROT_MTE
> mappings). I'm hoping QEMU can wrap its use of guest memory in a
> sequence which disables tag checking (something similar will be needed
> for the "protected VM" use case anyway), but this isn't something I've
> looked into.

As far as I can see, to map all the guest memory with PROT_MTE in VMM
is a little weird, and lots of APIs have to be changed to include this flag.
IMHO, it would be better if the KVM can provide new APIs to load/store the
guest memory tag which may make it easier to enable the Qemu migration
support.

>
> > I will update the status later next week!
>
> Great, I look forward to hearing how it goes.
>
> Thanks,
>
> Steve
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 0/2] MTE support for KVM guest

2020-12-04 Thread Haibo Xu
On Fri, 20 Nov 2020 at 17:51, Steven Price  wrote:
>
> On 19/11/2020 19:11, Marc Zyngier wrote:
> > On 2020-11-19 18:42, Andrew Jones wrote:
> >> On Thu, Nov 19, 2020 at 03:45:40PM +, Peter Maydell wrote:
> >>> On Thu, 19 Nov 2020 at 15:39, Steven Price  wrote:
> >>> > This series adds support for Arm's Memory Tagging Extension (MTE) to
> >>> > KVM, allowing KVM guests to make use of it. This builds on the
> >>> existing
> >>> > user space support already in v5.10-rc1, see [1] for an overview.
> >>>
> >>> > The change to require the VMM to map all guest memory PROT_MTE is
> >>> > significant as it means that the VMM has to deal with the MTE tags
> >>> even
> >>> > if it doesn't care about them (e.g. for virtual devices or if the VMM
> >>> > doesn't support migration). Also unfortunately because the VMM can
> >>> > change the memory layout at any time the check for PROT_MTE/VM_MTE has
> >>> > to be done very late (at the point of faulting pages into stage 2).
> >>>
> >>> I'm a bit dubious about requring the VMM to map the guest memory
> >>> PROT_MTE unless somebody's done at least a sketch of the design
> >>> for how this would work on the QEMU side. Currently QEMU just
> >>> assumes the guest memory is guest memory and it can access it
> >>> without special precautions...
> >>>
> >>
> >> There are two statements being made here:
> >>
> >> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit
> >>QEMU well.
> >>
> >> 2) New KVM features should be accompanied with supporting QEMU code in
> >>order to prove that the APIs make sense.
> >>
> >> I strongly agree with (2). While kvmtool supports some quick testing, it
> >> doesn't support migration. We must test all new features with a migration
> >> supporting VMM.
> >>
> >> I'm not sure about (1). I don't feel like it should be a major problem,
> >> but (2).
>
> (1) seems to be contentious whichever way we go. Either PROT_MTE isn't
> required in which case it's easy to accidentally screw up migration, or
> it is required in which case it's difficult to handle normal guest
> memory from the VMM. I get the impression that probably I should go back
> to the previous approach - sorry for the distraction with this change.
>
> (2) isn't something I'm trying to skip, but I'm limited in what I can do
> myself so would appreciate help here. Haibo is looking into this.
>

Hi Steven,

Sorry for the later reply!

I have finished the POC for the MTE migration support with the assumption
that all the memory is mapped with PROT_MTE. But I got stuck in the test
with a FVP setup. Previously, I successfully compiled a test case to verify
the basic function of MTE in a guest. But these days, the re-compiled test
can't be executed by the guest(very weird). The short plan to verify
the migration
is to set the MTE tags on one page in the guest, and try to dump the migrated
memory contents.

I will update the status later next week!

Regards,
Haibo

> >>
> >> I'd be happy to help with the QEMU prototype, but preferably when there's
> >> hardware available. Has all the current MTE testing just been done on
> >> simulators? And, if so, are there regression tests regularly running on
> >> the simulators too? And can they test migration? If hardware doesn't
> >> show up quickly and simulators aren't used for regression tests, then
> >> all this code will start rotting from day one.
>
> As Marc says, hardware isn't available. Testing is either via the Arm
> FVP model (that I've been using for most of my testing) or QEMU full
> system emulation.
>
> >
> > While I agree with the sentiment, the reality is pretty bleak.
> >
> > I'm pretty sure nobody will ever run a migration on emulation. I also doubt
> > there is much overlap between MTE users and migration users, unfortunately.
> >
> > No HW is available today, and when it becomes available, it will be in
> > the form of a closed system on which QEMU doesn't run, either because
> > we are locked out of EL2 (as usual), or because migration is not part of
> > the use case (like KVM on Android, for example).
> >
> > So we can wait another two (five?) years until general purpose HW becomes
> > available, or we start merging what we can test today. I'm inclined to
> > do the latter.
> >
> > And I think it is absolutely fine for QEMU to say "no MTE support with KVM"
> > (we can remove all userspace visibility, except for the capability).
>
> What I'm trying to achieve is a situation where KVM+MTE without
> migration works and we leave ourselves a clear path where migration can
> be added. With hindsight I think this version of the series was a wrong
> turn - if we return to not requiring PROT_MTE then we have the following
> two potential options to explore for migration in the future:
>
>   * The VMM can choose to enable PROT_MTE if it needs to, and if desired
> we can add a flag to enforce this in the kernel.
>
>   * If needed a new kernel interface can be provided to fetch/set tags
> from guest memory which isn't 

Re: [RFC PATCH v3 10/16] KVM: arm64: Add a new VM device control group for SPE

2020-11-05 Thread Haibo Xu
On Wed, 28 Oct 2020 at 01:26, Alexandru Elisei  wrote:
>
> Stage 2 faults triggered by the profiling buffer attempting to write to
> memory are reported by the SPE hardware by asserting a buffer management
> event interrupt. Interrupts are by their nature asynchronous, which means
> that the guest might have changed its stage 1 translation tables since the
> attempted write. SPE reports the guest virtual address that caused the data
> abort, but not the IPA, which means that KVM would have to walk the guest's
> stage 1 tables to find the IPA; using the AT instruction to walk the
> guest's tables in hardware is not an option because it doesn't report the
> IPA in the case of a stage 2 fault on a stage 1 table walk.
>
> Fix both problems by pre-mapping the guest's memory at stage 2 with write
> permissions to avoid any faults. Userspace calls mlock() on the VMAs that
> back the guest's memory, pinning the pages in memory, then tells KVM to map
> the memory at stage 2 by using the VM control group KVM_ARM_VM_SPE_CTRL
> with the attribute KVM_ARM_VM_SPE_FINALIZE. KVM will map all writable VMAs
> which have the VM_LOCKED flag set. Hugetlb VMAs are practically pinned in
> memory after they are faulted in and mlock() doesn't set the VM_LOCKED
> flag, and just faults the pages in; KVM will treat hugetlb VMAs like they
> have the VM_LOCKED flag and will also map them, faulting them in if
> necessary, when handling the ioctl.
>
> VM live migration relies on a bitmap of dirty pages. This bitmap is created
> by write-protecting a memslot and updating it as KVM handles stage 2 write
> faults. Because KVM cannot handle stage 2 faults reported by the profiling
> buffer, it will not pre-map a logging memslot. This effectively means that
> profiling is not available when the VM is configured for live migration.
>
> Signed-off-by: Alexandru Elisei 
> ---
>  Documentation/virt/kvm/devices/vm.rst |  28 +
>  arch/arm64/include/asm/kvm_host.h |   5 +
>  arch/arm64/include/asm/kvm_mmu.h  |   2 +
>  arch/arm64/include/uapi/asm/kvm.h |   3 +
>  arch/arm64/kvm/arm.c  |  78 +++-
>  arch/arm64/kvm/guest.c|  48 
>  arch/arm64/kvm/mmu.c  | 169 ++
>  arch/arm64/kvm/spe.c  |  81 
>  include/kvm/arm_spe.h |  36 ++
>  9 files changed, 448 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/devices/vm.rst 
> b/Documentation/virt/kvm/devices/vm.rst
> index 0aa5b1cfd700..b70798a72d8a 100644
> --- a/Documentation/virt/kvm/devices/vm.rst
> +++ b/Documentation/virt/kvm/devices/vm.rst
> @@ -314,3 +314,31 @@ Allows userspace to query the status of migration mode.
>  if it is enabled
>  :Returns:   -EFAULT if the given address is not accessible from kernel space;
> 0 in case of success.
> +
> +6. GROUP: KVM_ARM_VM_SPE_CTRL
> +===
> +
> +:Architectures: arm64
> +
> +6.1. ATTRIBUTE: KVM_ARM_VM_SPE_FINALIZE
> +-
> +
> +Finalizes the creation of the SPE feature by mapping the guest memory in the
> +stage 2 table. Guest memory must be readable, writable and pinned in RAM, 
> which
> +is achieved with an mlock() system call; the memory can be backed by a 
> hugetlbfs
> +file. Memory regions from read-only or dirty page logging enabled memslots 
> will
> +be ignored. After the call, no changes to the guest memory, including to its
> +contents, are permitted.
> +
> +Subsequent KVM_ARM_VCPU_INIT calls will cause the memory to become unmapped 
> and
> +the feature must be finalized again before any VCPU can run.
> +
> +If any VCPUs are run before finalizing the feature, KVM_RUN will return 
> -EPERM.
> +
> +:Parameters: none
> +:Returns:   -EAGAIN if guest memory has been modified while the call was
> +executing
> +-EBUSY if the feature is already initialized
> +-EFAULT if an address backing the guest memory is invalid
> +-ENXIO if SPE is not supported or not properly configured
> +0 in case of success
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 5b68c06930c6..27f581750c6e 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -92,6 +92,7 @@ struct kvm_s2_mmu {
>
>  struct kvm_arch {
> struct kvm_s2_mmu mmu;
> +   struct kvm_spe spe;
>
> /* VTCR_EL2 value for this VM */
> u64vtcr;
> @@ -612,6 +613,10 @@ void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> +int kvm_arm_vm_arch_set_attr(struct kvm *kvm, struct kvm_device_attr *attr);
> +int kvm_arm_vm_arch_get_attr(struct kvm *kvm, struct kvm_device_attr *attr);
> +int kvm_arm_vm_arch_has_attr(struct kvm *kvm, 

Re: [RFC PATCH v3 09/16] KVM: arm64: Use separate function for the mapping size in user_mem_abort()

2020-11-05 Thread Haibo Xu
On Wed, 28 Oct 2020 at 01:26, Alexandru Elisei  wrote:
>
> user_mem_abort() is already a long and complex function, let's make it
> slightly easier to understand by abstracting the algorithm for choosing the
> stage 2 IPA entry size into its own function.
>
> This also makes it possible to reuse the code when guest SPE support will
> be added.
>

Better to mention that there is "No functional change"!

> Signed-off-by: Alexandru Elisei 
> ---
>  arch/arm64/kvm/mmu.c | 55 ++--
>  1 file changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 19aacc7d64de..c3c43555490d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -738,12 +738,43 @@ transparent_hugepage_adjust(struct kvm_memory_slot 
> *memslot,
> return PAGE_SIZE;
>  }
>
> +static short stage2_max_pageshift(struct kvm_memory_slot *memslot,
> + struct vm_area_struct *vma, hva_t hva,
> + bool *force_pte)
> +{
> +   short pageshift;
> +
> +   *force_pte = false;
> +
> +   if (is_vm_hugetlb_page(vma))
> +   pageshift = huge_page_shift(hstate_vma(vma));
> +   else
> +   pageshift = PAGE_SHIFT;
> +
> +   if (memslot_is_logging(memslot) || (vma->vm_flags & VM_PFNMAP)) {
> +   *force_pte = true;
> +   pageshift = PAGE_SHIFT;
> +   }
> +
> +   if (pageshift == PUD_SHIFT &&
> +   !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> +   pageshift = PMD_SHIFT;
> +
> +   if (pageshift == PMD_SHIFT &&
> +   !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
> +   *force_pte = true;
> +   pageshift = PAGE_SHIFT;
> +   }
> +
> +   return pageshift;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   struct kvm_memory_slot *memslot, unsigned long hva,
>   unsigned long fault_status)
>  {
> int ret = 0;
> -   bool write_fault, writable, force_pte = false;
> +   bool write_fault, writable, force_pte;
> bool exec_fault;
> bool device = false;
> unsigned long mmu_seq;
> @@ -776,27 +807,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
> return -EFAULT;
> }
>
> -   if (is_vm_hugetlb_page(vma))
> -   vma_shift = huge_page_shift(hstate_vma(vma));
> -   else
> -   vma_shift = PAGE_SHIFT;
> -
> -   if (logging_active ||
> -   (vma->vm_flags & VM_PFNMAP)) {
> -   force_pte = true;
> -   vma_shift = PAGE_SHIFT;
> -   }
> -
> -   if (vma_shift == PUD_SHIFT &&
> -   !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> -  vma_shift = PMD_SHIFT;
> -
> -   if (vma_shift == PMD_SHIFT &&
> -   !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
> -   force_pte = true;
> -   vma_shift = PAGE_SHIFT;
> -   }
> -
> +   vma_shift = stage2_max_pageshift(memslot, vma, hva, _pte);
> vma_pagesize = 1UL << vma_shift;
> if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
> fault_ipa &= ~(vma_pagesize - 1);
> --
> 2.29.1
>
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v3 08/16] KVM: arm64: Add a new VCPU device control group for SPE

2020-11-05 Thread Haibo Xu
On Wed, 28 Oct 2020 at 01:26, Alexandru Elisei  wrote:
>
> From: Sudeep Holla 
>
> To configure the virtual SPE buffer management interrupt number, we use a
> VCPU kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_IRQ attribute
> within the KVM_ARM_VCPU_SPE_CTRL group.
>
> After configuring the SPE, userspace is required to call the VCPU ioctl
> with the attribute KVM_ARM_VCPU_SPE_INIT to initialize SPE on the VCPU.
>
> [Alexandru E: Fixed compilation errors, don't allow userspace to set the
> VCPU feature, removed unused functions, fixed mismatched
> descriptions, comments and error codes, reworked logic, rebased on
> top of v5.10-rc1]
>
> Signed-off-by: Sudeep Holla 
> Signed-off-by: Alexandru Elisei 
> ---
>  Documentation/virt/kvm/devices/vcpu.rst |  40 
>  arch/arm64/include/uapi/asm/kvm.h   |   3 +
>  arch/arm64/kvm/Makefile |   1 +
>  arch/arm64/kvm/guest.c  |   9 ++
>  arch/arm64/kvm/reset.c  |  23 +
>  arch/arm64/kvm/spe.c| 129 
>  include/kvm/arm_spe.h   |  27 +
>  include/uapi/linux/kvm.h|   1 +
>  8 files changed, 233 insertions(+)
>  create mode 100644 arch/arm64/kvm/spe.c
>
> diff --git a/Documentation/virt/kvm/devices/vcpu.rst 
> b/Documentation/virt/kvm/devices/vcpu.rst
> index 2acec3b9ef65..6135b9827fbe 100644
> --- a/Documentation/virt/kvm/devices/vcpu.rst
> +++ b/Documentation/virt/kvm/devices/vcpu.rst
> @@ -161,3 +161,43 @@ Specifies the base address of the stolen time structure 
> for this VCPU. The
>  base address must be 64 byte aligned and exist within a valid guest memory
>  region. See Documentation/virt/kvm/arm/pvtime.rst for more information
>  including the layout of the stolen time structure.
> +
> +4. GROUP: KVM_ARM_VCPU_SPE_CTRL
> +===
> +
> +:Architectures: ARM64
> +
> +4.1 ATTRIBUTE: KVM_ARM_VCPU_SPE_IRQ
> +---
> +
> +:Parameters: in kvm_device_attr.addr the address for the SPE buffer 
> management
> + interrupt is a pointer to an int
> +
> +Returns:
> +
> +===  
> +-EBUSY   The SPE buffer management interrupt is already set
> +-EINVAL  Invalid SPE overflow interrupt number
> +-EFAULT  Could not read the buffer management interrupt number
> +-ENXIO   SPE not supported or not properly configured
> +===  
> +
> +A value describing the SPE (Statistical Profiling Extension) overflow 
> interrupt
> +number for this vcpu. This interrupt should be a PPI and the interrupt type 
> and
> +number must be same for each vcpu.
> +
> +4.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_INIT
> +
> +
> +:Parameters: no additional parameter in kvm_device_attr.addr
> +
> +Returns:
> +
> +===  ==
> +-EBUSY   SPE already initialized
> +-ENODEV  GIC not initialized
> +-ENXIO   SPE not supported or not properly configured
> +===  ==
> +
> +Request the initialization of the SPE. Must be done after initializing the
> +in-kernel irqchip and after setting the interrupt number for the VCPU.
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 489e12304dbb..ca57dfb7abf0 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -360,6 +360,9 @@ struct kvm_vcpu_events {
>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
>  #define KVM_ARM_VCPU_PVTIME_CTRL   2
>  #define   KVM_ARM_VCPU_PVTIME_IPA  0
> +#define KVM_ARM_VCPU_SPE_CTRL  3
> +#define   KVM_ARM_VCPU_SPE_IRQ 0
> +#define   KVM_ARM_VCPU_SPE_INIT1
>
>  /* KVM_IRQ_LINE irq field index values */
>  #define KVM_ARM_IRQ_VCPU2_SHIFT28
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 1504c81fbf5d..f6e76f64ffbe 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -25,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
> $(KVM)/eventfd.o \
>  vgic/vgic-its.o vgic/vgic-debug.o
>
>  kvm-$(CONFIG_KVM_ARM_PMU)  += pmu-emul.o
> +kvm-$(CONFIG_KVM_ARM_SPE)  += spe.o
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index dfb5218137ca..2ba790eeb782 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -926,6 +926,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
> case KVM_ARM_VCPU_PVTIME_CTRL:
> ret = kvm_arm_pvtime_set_attr(vcpu, attr);
> break;
> +   case KVM_ARM_VCPU_SPE_CTRL:
> +   ret = kvm_arm_spe_set_attr(vcpu, attr);
> +   break;
> default:
>