Re: [PATCH v6 14/23] irqchip/gic-v4.1: Add VSGI allocation/teardown

2020-03-23 Thread Zenghui Yu

Hi Marc,

On 2020/3/21 2:23, Marc Zyngier wrote:

Allocate per-VPE SGIs when initializing the GIC-specific part of the
VPE data structure.

Signed-off-by: Marc Zyngier 
Reviewed-by: Zenghui Yu 
Link: https://lore.kernel.org/r/20200304203330.4967-15-...@kernel.org
---
  drivers/irqchip/irq-gic-v3-its.c   |  2 +-
  drivers/irqchip/irq-gic-v4.c   | 68 +-
  include/linux/irqchip/arm-gic-v4.h |  4 +-
  3 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 15250faa9ef7..7ad46ff5f0b9 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -4053,7 +4053,7 @@ static int its_sgi_set_vcpu_affinity(struct irq_data *d, 
void *vcpu_info)
struct its_cmd_info *info = vcpu_info;
  
  	switch (info->cmd_type) {

-   case PROP_UPDATE_SGI:
+   case PROP_UPDATE_VSGI:
vpe->sgi_config[d->hwirq].priority = info->priority;
vpe->sgi_config[d->hwirq].group = info->group;
its_configure_sgi(d, false);


[...]


@@ -103,7 +105,7 @@ enum its_vcpu_info_cmd_type {
SCHEDULE_VPE,
DESCHEDULE_VPE,
INVALL_VPE,
-   PROP_UPDATE_SGI,
+   PROP_UPDATE_VSGI,
  };
  
  struct its_cmd_info {


As Eric pointed out, this belongs to patch #12.


Thanks,
Zenghui

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 08/23] irqchip/gic-v4.1: Plumb skeletal VSGI irqchip

2020-03-23 Thread Zenghui Yu

On 2020/3/21 2:23, Marc Zyngier wrote:

+static int its_sgi_set_affinity(struct irq_data *d,
+   const struct cpumask *mask_val,
+   bool force)
+{
+   /*
+* There is no notion of affinity for virtual SGIs, at least
+* not on the host (since they can only be targetting a vPE).
+* Tell the kernel we've done whetever it asked for.


new typo?
s/whetever/whatever/


+*/
+   return IRQ_SET_MASK_OK;
+}



Thanks,
Zenghui

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 1/9] KVM: Pass kvm_init()'s opaque param to additional arch funcs

2020-03-23 Thread Paul Mackerras
On Sat, Mar 21, 2020 at 01:25:55PM -0700, Sean Christopherson wrote:
> Pass @opaque to kvm_arch_hardware_setup() and
> kvm_arch_check_processor_compat() to allow architecture specific code to
> reference @opaque without having to stash it away in a temporary global
> variable.  This will enable x86 to separate its vendor specific callback
> ops, which are passed via @opaque, into "init" and "runtime" ops without
> having to stash away the "init" ops.
> 
> No functional change intended.
> 
> Reviewed-by: Cornelia Huck 
> Tested-by: Cornelia Huck  #s390
> Acked-by: Marc Zyngier 
> Signed-off-by: Sean Christopherson 

Acked-by: Paul Mackerras 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 4/9] KVM: VMX: Configure runtime hooks using vmx_x86_ops

2020-03-23 Thread Paolo Bonzini
On 23/03/20 13:27, Vitaly Kuznetsov wrote:
>> -kvm_x86_ops->check_nested_events = vmx_check_nested_events;
>> -kvm_x86_ops->get_nested_state = vmx_get_nested_state;
>> -kvm_x86_ops->set_nested_state = vmx_set_nested_state;
>> -kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
>> -kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
>> -kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
>> +ops->check_nested_events = vmx_check_nested_events;
>> +ops->get_nested_state = vmx_get_nested_state;
>> +ops->set_nested_state = vmx_set_nested_state;
>> +ops->get_vmcs12_pages = nested_get_vmcs12_pages;
>> +ops->nested_enable_evmcs = nested_enable_evmcs;
>> +ops->nested_get_evmcs_version = nested_get_evmcs_version;
> 
> A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
> would make 'git grep vmx_x86_ops' output more complete.
> 

I would prefer even more a kvm_x86_ops.nested struct but I would be okay
with a separate patch.

Paolo

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

2020-03-23 Thread Paolo Bonzini
On 23/03/20 17:24, Vitaly Kuznetsov wrote:
> Sounds cool! (not sure that with only two implementations people won't
> call it 'over-engineered' but cool). 

Yes, something like

#define KVM_X86_OP(name) .name = vmx_##name

(svm_##name for svm.c) and then

KVM_X86_OP(check_nested_events)

etc.

> My personal wish would just be that
> function names in function implementations are not auto-generated so
> e.g. a simple 'git grep vmx_hardware_setup' works

Yes, absolutely; the function names would still be written by hand.

Paolo

> but the way how we
> fill vmx_x86_ops in can be macroed I guess.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 2/2] target/arm: kvm: Handle potential issue with dabt injection

2020-03-23 Thread Richard Henderson
On 3/23/20 4:32 AM, Beata Michalska wrote:
>  uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
> +uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */

Is there a reason these are uint8_t and not bool?


r~
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

2020-03-23 Thread Sean Christopherson
On Mon, Mar 23, 2020 at 05:24:56PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson  writes:
> 
> > On Mon, Mar 23, 2020 at 01:10:40PM +0100, Vitaly Kuznetsov wrote:
> >> Sean Christopherson  writes:
> >> 
> >> > +
> >> > +.runtime_ops = _x86_ops,
> >> > +};
> >> 
> >> Unrelated to your patch but I think we can make the naming of some of
> >> these functions more consistend on SVM/VMX, in particular I'd suggest 
> >> 
> >> has_svm() -> cpu_has_svm_support()
> >> is_disabled -> svm_disabled_by_bios()
> >> ...
> >> (see below for VMX)
> >> 
> >> > +
> >> >  static int __init svm_init(void)
> >> >  {
> >> > -return kvm_init(_x86_ops, sizeof(struct vcpu_svm),
> >> > +return kvm_init(_init_ops, sizeof(struct vcpu_svm),
> >> >  __alignof__(struct vcpu_svm), THIS_MODULE);
> >> >  }
> >> >  
> >> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >> > index 07299a957d4a..ffcdcc86f5b7 100644
> >> > --- a/arch/x86/kvm/vmx/vmx.c
> >> > +++ b/arch/x86/kvm/vmx/vmx.c
> >> > @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong 
> >> > bit)
> >> >  }
> >> >  
> >> >  static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >> > -.cpu_has_kvm_support = cpu_has_kvm_support,
> >> > -.disabled_by_bios = vmx_disabled_by_bios,
> >> > -.hardware_setup = hardware_setup,
> >> >  .hardware_unsetup = hardware_unsetup,
> >> > -.check_processor_compatibility = vmx_check_processor_compat,
> >> > +
> >> >  .hardware_enable = hardware_enable,
> >> >  .hardware_disable = hardware_disable,
> >> >  .cpu_has_accelerated_tpr = report_flexpriority,
> >> > @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops 
> >> > __ro_after_init = {
> >> >  .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >> >  };
> >> >  
> >> > +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> >> > +.cpu_has_kvm_support = cpu_has_kvm_support,
> >> > +.disabled_by_bios = vmx_disabled_by_bios,
> >> > +.check_processor_compatibility = vmx_check_processor_compat,
> >> > +.hardware_setup = hardware_setup,
> >> 
> >> cpu_has_kvm_support() -> cpu_has_vmx_support()
> >> hardware_setup() -> vmx_hardware_setup()
> >
> > Preaching to the choir on this one.  The VMX functions without prefixes in
> > in particular annoy me to no end, e.g. hardware_setup().  Though the worst
> > is probably ".vcpu_create = vmx_create_vcpu", if I had a nickel for every
> > time I've tried to find vmx_vcpu_create()...
> >
> > What if we added a macro to auto-generate the common/required hooks?  E.g.:
> >
> >   static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > MANDATORY_KVM_X86_OPS(vmx),
> >
> > .pmu_ops = _pmu_ops,
> >
> > ...
> >   };
> >
> > That'd enforce consistent naming, and would provide a bit of documentation
> > as to which hooks are optional, e.g. many of the nested hooks, and which
> > must be defined for KVM to function.
> 
> Sounds cool! (not sure that with only two implementations people won't
> call it 'over-engineered' but cool). My personal wish would just be that
> function names in function implementations are not auto-generated so
> e.g. a simple 'git grep vmx_hardware_setup' works but the way how we
> fill vmx_x86_ops in can be macroed I guess.

Ya, I was thinking of just the macro.  Even that has downsides though, e.g.
chasing kvm_x86_ops.hardware_setup() to find VMX's hardware_setup() becomes
a bit kludgy.  On the other hand, _if_ you know how the fill macro works,
getting to the implementation should be easier.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 4/9] KVM: VMX: Configure runtime hooks using vmx_x86_ops

2020-03-23 Thread Sean Christopherson
On Mon, Mar 23, 2020 at 01:27:28PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson  writes:
> 
> > Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead
> > of using the global kvm_x86_ops.  This sets the stage for waiting until
> > after ->hardware_setup() to set kvm_x86_ops with the vendor's
> > implementation.
> >
> > Signed-off-by: Sean Christopherson 
> > ---
> >  arch/x86/kvm/vmx/nested.c | 15 ---
> >  arch/x86/kvm/vmx/nested.h |  3 ++-
> >  arch/x86/kvm/vmx/vmx.c| 27 ++-
> >  3 files changed, 24 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 4ff859c99946..87fea22c3799 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -6241,7 +6241,8 @@ void nested_vmx_hardware_unsetup(void)
> > }
> >  }
> >  
> > -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct 
> > kvm_vcpu *))
> > +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> > +int (*exit_handlers[])(struct kvm_vcpu *))
> >  {
> > int i;
> >  
> > @@ -6277,12 +6278,12 @@ __init int nested_vmx_hardware_setup(int 
> > (*exit_handlers[])(struct kvm_vcpu *))
> > exit_handlers[EXIT_REASON_INVVPID]  = handle_invvpid;
> > exit_handlers[EXIT_REASON_VMFUNC]   = handle_vmfunc;
> >  
> > -   kvm_x86_ops->check_nested_events = vmx_check_nested_events;
> > -   kvm_x86_ops->get_nested_state = vmx_get_nested_state;
> > -   kvm_x86_ops->set_nested_state = vmx_set_nested_state;
> > -   kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> > -   kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
> > -   kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
> > +   ops->check_nested_events = vmx_check_nested_events;
> > +   ops->get_nested_state = vmx_get_nested_state;
> > +   ops->set_nested_state = vmx_set_nested_state;
> > +   ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> > +   ops->nested_enable_evmcs = nested_enable_evmcs;
> > +   ops->nested_get_evmcs_version = nested_get_evmcs_version;
> 
> 
> A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
> would make 'git grep vmx_x86_ops' output more complete.

Ah, didn't think about that, obviously.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 6/9] KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection

2020-03-23 Thread Sean Christopherson
On Mon, Mar 23, 2020 at 01:44:46PM +0100, Vitaly Kuznetsov wrote:
> (OK, I have to admit I trust the fact that GCC is still able to build
> KVM modules more than my own eyes)

Ha, yep, I trust gcc far more than my grep skills.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

2020-03-23 Thread Sean Christopherson
On Mon, Mar 23, 2020 at 01:10:40PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson  writes:
> 
> > +
> > +   .runtime_ops = _x86_ops,
> > +};
> 
> Unrelated to your patch but I think we can make the naming of some of
> these functions more consistend on SVM/VMX, in particular I'd suggest 
> 
> has_svm() -> cpu_has_svm_support()
> is_disabled -> svm_disabled_by_bios()
> ...
> (see below for VMX)
> 
> > +
> >  static int __init svm_init(void)
> >  {
> > -   return kvm_init(_x86_ops, sizeof(struct vcpu_svm),
> > +   return kvm_init(_init_ops, sizeof(struct vcpu_svm),
> > __alignof__(struct vcpu_svm), THIS_MODULE);
> >  }
> >  
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 07299a957d4a..ffcdcc86f5b7 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong 
> > bit)
> >  }
> >  
> >  static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > -   .cpu_has_kvm_support = cpu_has_kvm_support,
> > -   .disabled_by_bios = vmx_disabled_by_bios,
> > -   .hardware_setup = hardware_setup,
> > .hardware_unsetup = hardware_unsetup,
> > -   .check_processor_compatibility = vmx_check_processor_compat,
> > +
> > .hardware_enable = hardware_enable,
> > .hardware_disable = hardware_disable,
> > .cpu_has_accelerated_tpr = report_flexpriority,
> > @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops 
> > __ro_after_init = {
> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >  };
> >  
> > +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> > +   .cpu_has_kvm_support = cpu_has_kvm_support,
> > +   .disabled_by_bios = vmx_disabled_by_bios,
> > +   .check_processor_compatibility = vmx_check_processor_compat,
> > +   .hardware_setup = hardware_setup,
> 
> cpu_has_kvm_support() -> cpu_has_vmx_support()
> hardware_setup() -> vmx_hardware_setup()

Preaching to the choir on this one.  The VMX functions without prefixes in
in particular annoy me to no end, e.g. hardware_setup().  Though the worst
is probably ".vcpu_create = vmx_create_vcpu", if I had a nickel for every
time I've tried to find vmx_vcpu_create()...

What if we added a macro to auto-generate the common/required hooks?  E.g.:

  static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
MANDATORY_KVM_X86_OPS(vmx),

.pmu_ops = _pmu_ops,

...
  };

That'd enforce consistent naming, and would provide a bit of documentation
as to which hooks are optional, e.g. many of the nested hooks, and which
must be defined for KVM to function.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 8/9] KVM: VMX: Annotate vmx_x86_ops as __initdata

2020-03-23 Thread Vitaly Kuznetsov
Sean Christopherson  writes:

> Tag vmx_x86_ops with __initdata now the the struct is copied by value

Typo, "now that the struct".

> to a common x86 instance of kvm_x86_ops as part of kvm_init().
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/vmx/vmx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index fac22e316417..eae90379d0d1 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7668,7 +7668,7 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
>   return supported & BIT(bit);
>  }
>  
> -static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> +static struct kvm_x86_ops vmx_x86_ops __initdata = {
>   .hardware_unsetup = hardware_unsetup,
>  
>   .hardware_enable = hardware_enable,

Reviewed-by: Vitaly Kuznetsov 

-- 
Vitaly

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 3/9] KVM: VMX: Move hardware_setup() definition below vmx_x86_ops

2020-03-23 Thread Vitaly Kuznetsov
Sean Christopherson  writes:

> Move VMX's hardware_setup() below its vmx_x86_ops definition so that a
> future patch can refactor hardware_setup() to modify vmx_x86_ops
> directly instead of indirectly modifying the ops via the global
> kvm_x86_ops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/vmx/vmx.c | 346 -
>  1 file changed, 173 insertions(+), 173 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ffcdcc86f5b7..82dab775d520 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7652,179 +7652,6 @@ static bool vmx_apic_init_signal_blocked(struct 
> kvm_vcpu *vcpu)
>   return to_vmx(vcpu)->nested.vmxon;
>  }
>  
> -static __init int hardware_setup(void)
> -{
> - unsigned long host_bndcfgs;
> - struct desc_ptr dt;
> - int r, i, ept_lpage_level;
> -
> - store_idt();
> - host_idt_base = dt.address;
> -
> - for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
> - kvm_define_shared_msr(i, vmx_msr_index[i]);
> -
> - if (setup_vmcs_config(_config, _capability) < 0)
> - return -EIO;
> -
> - if (boot_cpu_has(X86_FEATURE_NX))
> - kvm_enable_efer_bits(EFER_NX);
> -
> - if (boot_cpu_has(X86_FEATURE_MPX)) {
> - rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
> - WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
> - }
> -
> - if (!cpu_has_vmx_mpx())
> - supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
> - XFEATURE_MASK_BNDCSR);
> -
> - if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
> - !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
> - enable_vpid = 0;
> -
> - if (!cpu_has_vmx_ept() ||
> - !cpu_has_vmx_ept_4levels() ||
> - !cpu_has_vmx_ept_mt_wb() ||
> - !cpu_has_vmx_invept_global())
> - enable_ept = 0;
> -
> - if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
> - enable_ept_ad_bits = 0;
> -
> - if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
> - enable_unrestricted_guest = 0;
> -
> - if (!cpu_has_vmx_flexpriority())
> - flexpriority_enabled = 0;
> -
> - if (!cpu_has_virtual_nmis())
> - enable_vnmi = 0;
> -
> - /*
> -  * set_apic_access_page_addr() is used to reload apic access
> -  * page upon invalidation.  No need to do anything if not
> -  * using the APIC_ACCESS_ADDR VMCS field.
> -  */
> - if (!flexpriority_enabled)
> - kvm_x86_ops->set_apic_access_page_addr = NULL;
> -
> - if (!cpu_has_vmx_tpr_shadow())
> - kvm_x86_ops->update_cr8_intercept = NULL;
> -
> -#if IS_ENABLED(CONFIG_HYPERV)
> - if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
> - && enable_ept) {
> - kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
> - kvm_x86_ops->tlb_remote_flush_with_range =
> - hv_remote_flush_tlb_with_range;
> - }
> -#endif
> -
> - if (!cpu_has_vmx_ple()) {
> - ple_gap = 0;
> - ple_window = 0;
> - ple_window_grow = 0;
> - ple_window_max = 0;
> - ple_window_shrink = 0;
> - }
> -
> - if (!cpu_has_vmx_apicv()) {
> - enable_apicv = 0;
> - kvm_x86_ops->sync_pir_to_irr = NULL;
> - }
> -
> - if (cpu_has_vmx_tsc_scaling()) {
> - kvm_has_tsc_control = true;
> - kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
> - kvm_tsc_scaling_ratio_frac_bits = 48;
> - }
> -
> - set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
> -
> - if (enable_ept)
> - vmx_enable_tdp();
> -
> - if (!enable_ept)
> - ept_lpage_level = 0;
> - else if (cpu_has_vmx_ept_1g_page())
> - ept_lpage_level = PT_PDPE_LEVEL;
> - else if (cpu_has_vmx_ept_2m_page())
> - ept_lpage_level = PT_DIRECTORY_LEVEL;
> - else
> - ept_lpage_level = PT_PAGE_TABLE_LEVEL;
> - kvm_configure_mmu(enable_ept, ept_lpage_level);
> -
> - /*
> -  * Only enable PML when hardware supports PML feature, and both EPT
> -  * and EPT A/D bit features are enabled -- PML depends on them to work.
> -  */
> - if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
> - enable_pml = 0;
> -
> - if (!enable_pml) {
> - kvm_x86_ops->slot_enable_log_dirty = NULL;
> - kvm_x86_ops->slot_disable_log_dirty = NULL;
> - kvm_x86_ops->flush_log_dirty = NULL;
> - kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> - }
> -
> - if (!cpu_has_vmx_preemption_timer())
> - enable_preemption_timer = false;
> -
> - if (enable_preemption_timer) {
> - u64 use_timer_freq = 5000ULL * 1000 * 1000;
> 

Re: [PATCH v3 6/9] KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection

2020-03-23 Thread Vitaly Kuznetsov
Sean Christopherson  writes:

> Replace the kvm_x86_ops pointer in common x86 with an instance of the
> struct to save one memory instance when invoking function.  Copy the
> struct by value to set the ops during kvm_init().
>
> Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the
> ops have been initialized, i.e. a vendor KVM module has been loaded.
>
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/include/asm/kvm_host.h |  18 +-
>  arch/x86/kvm/cpuid.c|   4 +-
>  arch/x86/kvm/hyperv.c   |   8 +-
>  arch/x86/kvm/kvm_cache_regs.h   |  10 +-
>  arch/x86/kvm/lapic.c|  30 +--
>  arch/x86/kvm/mmu.h  |   8 +-
>  arch/x86/kvm/mmu/mmu.c  |  32 +--
>  arch/x86/kvm/pmu.c  |  30 +--
>  arch/x86/kvm/pmu.h  |   2 +-
>  arch/x86/kvm/svm.c  |   2 +-
>  arch/x86/kvm/trace.h|   4 +-
>  arch/x86/kvm/vmx/nested.c   |   2 +-
>  arch/x86/kvm/vmx/vmx.c  |   4 +-
>  arch/x86/kvm/x86.c  | 356 
>  arch/x86/kvm/x86.h  |   4 +-
>  15 files changed, 257 insertions(+), 257 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f4c5b49299ff..54f991244fae 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1274,13 +1274,13 @@ struct kvm_arch_async_pf {
>  
>  extern u64 __read_mostly host_efer;
>  
> -extern struct kvm_x86_ops *kvm_x86_ops;
> +extern struct kvm_x86_ops kvm_x86_ops;
>  extern struct kmem_cache *x86_fpu_cache;
>  
>  #define __KVM_HAVE_ARCH_VM_ALLOC
>  static inline struct kvm *kvm_arch_alloc_vm(void)
>  {
> - return __vmalloc(kvm_x86_ops->vm_size,
> + return __vmalloc(kvm_x86_ops.vm_size,
>GFP_KERNEL_ACCOUNT | __GFP_ZERO, PAGE_KERNEL);
>  }
>  void kvm_arch_free_vm(struct kvm *kvm);
> @@ -1288,8 +1288,8 @@ void kvm_arch_free_vm(struct kvm *kvm);
>  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB
>  static inline int kvm_arch_flush_remote_tlb(struct kvm *kvm)
>  {
> - if (kvm_x86_ops->tlb_remote_flush &&
> - !kvm_x86_ops->tlb_remote_flush(kvm))
> + if (kvm_x86_ops.tlb_remote_flush &&
> + !kvm_x86_ops.tlb_remote_flush(kvm))
>   return 0;
>   else
>   return -ENOTSUPP;
> @@ -1375,7 +1375,7 @@ extern u64 kvm_mce_cap_supported;
>   *
>   * EMULTYPE_SKIP - Set when emulating solely to skip an instruction, i.e. to
>   *  decode the instruction length.  For use *only* by
> - *  kvm_x86_ops->skip_emulated_instruction() implementations.
> + *  kvm_x86_ops.skip_emulated_instruction() implementations.
>   *
>   * EMULTYPE_ALLOW_RETRY_PF - Set when the emulator should resume the guest to
>   *retry native execution under certain conditions,
> @@ -1669,14 +1669,14 @@ static inline bool kvm_irq_is_postable(struct 
> kvm_lapic_irq *irq)
>  
>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
>  {
> - if (kvm_x86_ops->vcpu_blocking)
> - kvm_x86_ops->vcpu_blocking(vcpu);
> + if (kvm_x86_ops.vcpu_blocking)
> + kvm_x86_ops.vcpu_blocking(vcpu);
>  }
>  
>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
>  {
> - if (kvm_x86_ops->vcpu_unblocking)
> - kvm_x86_ops->vcpu_unblocking(vcpu);
> + if (kvm_x86_ops.vcpu_unblocking)
> + kvm_x86_ops.vcpu_unblocking(vcpu);
>  }
>  
>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 435a7da07d5f..0aefa9acae10 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -209,7 +209,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
>   vcpu->arch.cpuid_nent = cpuid->nent;
>   cpuid_fix_nx_cap(vcpu);
>   kvm_apic_set_version(vcpu);
> - kvm_x86_ops->cpuid_update(vcpu);
> + kvm_x86_ops.cpuid_update(vcpu);
>   r = kvm_update_cpuid(vcpu);
>  
>  out:
> @@ -232,7 +232,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
>   goto out;
>   vcpu->arch.cpuid_nent = cpuid->nent;
>   kvm_apic_set_version(vcpu);
> - kvm_x86_ops->cpuid_update(vcpu);
> + kvm_x86_ops.cpuid_update(vcpu);
>   r = kvm_update_cpuid(vcpu);
>  out:
>   return r;
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index a86fda7a1d03..bcefa9d4e57e 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1022,7 +1022,7 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 
> msr, u64 data,
>   addr = gfn_to_hva(kvm, gfn);
>   if (kvm_is_error_hva(addr))
>   return 1;
> - kvm_x86_ops->patch_hypercall(vcpu, instructions);
> + kvm_x86_ops.patch_hypercall(vcpu, instructions);
>   ((unsigned char *)instructions)[3] = 0xc3; /* ret */

Re: [PATCH v3 4/9] KVM: VMX: Configure runtime hooks using vmx_x86_ops

2020-03-23 Thread Vitaly Kuznetsov
Sean Christopherson  writes:

> Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead
> of using the global kvm_x86_ops.  This sets the stage for waiting until
> after ->hardware_setup() to set kvm_x86_ops with the vendor's
> implementation.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/vmx/nested.c | 15 ---
>  arch/x86/kvm/vmx/nested.h |  3 ++-
>  arch/x86/kvm/vmx/vmx.c| 27 ++-
>  3 files changed, 24 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 4ff859c99946..87fea22c3799 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6241,7 +6241,8 @@ void nested_vmx_hardware_unsetup(void)
>   }
>  }
>  
> -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu 
> *))
> +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> +  int (*exit_handlers[])(struct kvm_vcpu *))
>  {
>   int i;
>  
> @@ -6277,12 +6278,12 @@ __init int nested_vmx_hardware_setup(int 
> (*exit_handlers[])(struct kvm_vcpu *))
>   exit_handlers[EXIT_REASON_INVVPID]  = handle_invvpid;
>   exit_handlers[EXIT_REASON_VMFUNC]   = handle_vmfunc;
>  
> - kvm_x86_ops->check_nested_events = vmx_check_nested_events;
> - kvm_x86_ops->get_nested_state = vmx_get_nested_state;
> - kvm_x86_ops->set_nested_state = vmx_set_nested_state;
> - kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> - kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
> - kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
> + ops->check_nested_events = vmx_check_nested_events;
> + ops->get_nested_state = vmx_get_nested_state;
> + ops->set_nested_state = vmx_set_nested_state;
> + ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> + ops->nested_enable_evmcs = nested_enable_evmcs;
> + ops->nested_get_evmcs_version = nested_get_evmcs_version;


A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
would make 'git grep vmx_x86_ops' output more complete.

>  
>   return 0;
>  }
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index f70968b76d33..ac56aefa49e3 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -19,7 +19,8 @@ enum nvmx_vmentry_status {
>  void vmx_leave_nested(struct kvm_vcpu *vcpu);
>  void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
>  void nested_vmx_hardware_unsetup(void);
> -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu 
> *));
> +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> +  int (*exit_handlers[])(struct kvm_vcpu *));
>  void nested_vmx_set_vmcs_shadowing_bitmap(void);
>  void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu);
>  enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu 
> *vcpu,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 82dab775d520..cfa9093bdc06 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7860,16 +7860,16 @@ static __init int hardware_setup(void)
>* using the APIC_ACCESS_ADDR VMCS field.
>*/
>   if (!flexpriority_enabled)
> - kvm_x86_ops->set_apic_access_page_addr = NULL;
> + vmx_x86_ops.set_apic_access_page_addr = NULL;
>  
>   if (!cpu_has_vmx_tpr_shadow())
> - kvm_x86_ops->update_cr8_intercept = NULL;
> + vmx_x86_ops.update_cr8_intercept = NULL;
>  
>  #if IS_ENABLED(CONFIG_HYPERV)
>   if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
>   && enable_ept) {
> - kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
> - kvm_x86_ops->tlb_remote_flush_with_range =
> + vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
> + vmx_x86_ops.tlb_remote_flush_with_range =
>   hv_remote_flush_tlb_with_range;
>   }
>  #endif
> @@ -7884,7 +7884,7 @@ static __init int hardware_setup(void)
>  
>   if (!cpu_has_vmx_apicv()) {
>   enable_apicv = 0;
> - kvm_x86_ops->sync_pir_to_irr = NULL;
> + vmx_x86_ops.sync_pir_to_irr = NULL;
>   }
>  
>   if (cpu_has_vmx_tsc_scaling()) {
> @@ -7916,10 +7916,10 @@ static __init int hardware_setup(void)
>   enable_pml = 0;
>  
>   if (!enable_pml) {
> - kvm_x86_ops->slot_enable_log_dirty = NULL;
> - kvm_x86_ops->slot_disable_log_dirty = NULL;
> - kvm_x86_ops->flush_log_dirty = NULL;
> - kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> + vmx_x86_ops.slot_enable_log_dirty = NULL;
> + vmx_x86_ops.slot_disable_log_dirty = NULL;
> + vmx_x86_ops.flush_log_dirty = NULL;
> + vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
>   }
>  
>   

Re: [PATCH v3 9/9] KVM: SVM: Annotate svm_x86_ops as __initdata

2020-03-23 Thread Vitaly Kuznetsov
Sean Christopherson  writes:

> Tag svm_x86_ops with __initdata now the the struct is copied by value to

Same typo, "now that the struct".

> a common x86 instance of kvm_x86_ops as part of kvm_init().
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index b61bb306602b..ab2a1cf6c188 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7350,7 +7350,7 @@ static void svm_pre_update_apicv_exec_ctrl(struct kvm 
> *kvm, bool activate)
>   avic_update_access_page(kvm, activate);
>  }
>  
> -static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> +static struct kvm_x86_ops svm_x86_ops __initdata = {
>   .hardware_unsetup = svm_hardware_teardown,
>   .hardware_enable = svm_hardware_enable,
>   .hardware_disable = svm_hardware_disable,

Reviewed-by: Vitaly Kuznetsov 

-- 
Vitaly

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/9] KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()

2020-03-23 Thread Vitaly Kuznetsov
Sean Christopherson  writes:

> Remove the __exit annotation from VMX hardware_unsetup(), the hook
> can be reached during kvm_init() by way of kvm_arch_hardware_unsetup()
> if failure occurs at various points during initialization.
>
> Note, there is no known functional issue with the __exit annotation, the
> above is merely justification for its removal.  The real motivation is
> to be able to annotate vmx_x86_ops and svm_x86_ops with __initdata,
> which makes objtool complain because objtool doesn't understand that the
> vendor specific __initdata is being copied by value to a non-__initdata
> instance.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/include/asm/kvm_host.h | 2 +-
>  arch/x86/kvm/vmx/vmx.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 54f991244fae..42a2d0d3984a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1056,7 +1056,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool 
> dest_mode_logical)
>  struct kvm_x86_ops {
>   int (*hardware_enable)(void);
>   void (*hardware_disable)(void);
> - void (*hardware_unsetup)(void);/* __exit */
> + void (*hardware_unsetup)(void);
>   bool (*cpu_has_accelerated_tpr)(void);
>   bool (*has_emulated_msr)(int index);
>   void (*cpuid_update)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4bbe0d165a0c..fac22e316417 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7652,7 +7652,7 @@ static bool vmx_apic_init_signal_blocked(struct 
> kvm_vcpu *vcpu)
>   return to_vmx(vcpu)->nested.vmxon;
>  }
>  
> -static __exit void hardware_unsetup(void)
> +static void hardware_unsetup(void)
>  {
>   if (nested)
>   nested_vmx_hardware_unsetup();

Reviewed-by: Vitaly Kuznetsov 

-- 
Vitaly

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

2020-03-23 Thread Vitaly Kuznetsov
Sean Christopherson  writes:

> Move the kvm_x86_ops functions that are used only within the scope of
> kvm_init() into a separate struct, kvm_x86_init_ops.  In addition to
> identifying the init-only functions without restorting to code comments,
> this also sets the stage for waiting until after ->hardware_setup() to
> set kvm_x86_ops.  Setting kvm_x86_ops after ->hardware_setup() is
> desirable as many of the hooks are not usable until ->hardware_setup()
> completes.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/include/asm/kvm_host.h | 13 +
>  arch/x86/kvm/svm.c  | 15 ++-
>  arch/x86/kvm/vmx/vmx.c  | 16 +++-
>  arch/x86/kvm/x86.c  | 10 ++
>  4 files changed, 36 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9a183e9d4cb1..f4c5b49299ff 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1054,12 +1054,8 @@ static inline u16 kvm_lapic_irq_dest_mode(bool 
> dest_mode_logical)
>  }
>  
>  struct kvm_x86_ops {
> - int (*cpu_has_kvm_support)(void);  /* __init */
> - int (*disabled_by_bios)(void); /* __init */
>   int (*hardware_enable)(void);
>   void (*hardware_disable)(void);
> - int (*check_processor_compatibility)(void);/* __init */
> - int (*hardware_setup)(void);   /* __init */
>   void (*hardware_unsetup)(void);/* __exit */
>   bool (*cpu_has_accelerated_tpr)(void);
>   bool (*has_emulated_msr)(int index);
> @@ -1260,6 +1256,15 @@ struct kvm_x86_ops {
>   int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>  };
>  
> +struct kvm_x86_init_ops {
> + int (*cpu_has_kvm_support)(void);
> + int (*disabled_by_bios)(void);
> + int (*check_processor_compatibility)(void);
> + int (*hardware_setup)(void);
> +
> + struct kvm_x86_ops *runtime_ops;
> +};
> +
>  struct kvm_arch_async_pf {
>   u32 token;
>   gfn_t gfn;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 2125c6ae5951..33e67c3389c2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7351,11 +7351,7 @@ static void svm_pre_update_apicv_exec_ctrl(struct kvm 
> *kvm, bool activate)
>  }
>  
>  static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> - .cpu_has_kvm_support = has_svm,
> - .disabled_by_bios = is_disabled,
> - .hardware_setup = svm_hardware_setup,
>   .hardware_unsetup = svm_hardware_teardown,
> - .check_processor_compatibility = svm_check_processor_compat,
>   .hardware_enable = svm_hardware_enable,
>   .hardware_disable = svm_hardware_disable,
>   .cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,
> @@ -7480,9 +7476,18 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init 
> = {
>   .check_nested_events = svm_check_nested_events,
>  };
>  
> +static struct kvm_x86_init_ops svm_init_ops __initdata = {
> + .cpu_has_kvm_support = has_svm,
> + .disabled_by_bios = is_disabled,
> + .hardware_setup = svm_hardware_setup,
> + .check_processor_compatibility = svm_check_processor_compat,
> +
> + .runtime_ops = _x86_ops,
> +};

Unrelated to your patch but I think we can make the naming of some of
these functions more consistend on SVM/VMX, in particular I'd suggest 

has_svm() -> cpu_has_svm_support()
is_disabled -> svm_disabled_by_bios()
...
(see below for VMX)

> +
>  static int __init svm_init(void)
>  {
> - return kvm_init(_x86_ops, sizeof(struct vcpu_svm),
> + return kvm_init(_init_ops, sizeof(struct vcpu_svm),
>   __alignof__(struct vcpu_svm), THIS_MODULE);
>  }
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 07299a957d4a..ffcdcc86f5b7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
>  }
>  
>  static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> - .cpu_has_kvm_support = cpu_has_kvm_support,
> - .disabled_by_bios = vmx_disabled_by_bios,
> - .hardware_setup = hardware_setup,
>   .hardware_unsetup = hardware_unsetup,
> - .check_processor_compatibility = vmx_check_processor_compat,
> +
>   .hardware_enable = hardware_enable,
>   .hardware_disable = hardware_disable,
>   .cpu_has_accelerated_tpr = report_flexpriority,
> @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init 
> = {
>   .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>  };
>  
> +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> + .cpu_has_kvm_support = cpu_has_kvm_support,
> + .disabled_by_bios = vmx_disabled_by_bios,
> + .check_processor_compatibility = vmx_check_processor_compat,
> + .hardware_setup = hardware_setup,

cpu_has_kvm_support() -> cpu_has_vmx_support()
hardware_setup() -> 

Re: [PATCH v4 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-03-23 Thread Andrew Jones
On Mon, Mar 23, 2020 at 11:32:26AM +, Beata Michalska wrote:
> On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> exception with no valid ISS info to be decoded. The lack of decode info
> makes it at least tricky to emulate those instruction which is one of the
> (many) reasons why KVM will not even try to do so.
> 
> Add support for handling those by requesting KVM to inject external
> dabt into the quest.
> 
> Signed-off-by: Beata Michalska 
> ---
>  target/arm/cpu.h |  2 ++
>  target/arm/kvm.c | 54 
> 
>  target/arm/kvm_arm.h | 11 +++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 4ffd991..4f834c1 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -560,6 +560,8 @@ typedef struct CPUARMState {
>  uint64_t esr;
>  } serror;
>  
> +uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
> +
>  /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
>  uint32_t irq_line_state;
>  
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 85860e6..c088589 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
>  
>  static bool cap_has_mp_state;
>  static bool cap_has_inject_serror_esr;
> +static bool cap_has_inject_ext_dabt;
>  
>  static ARMHostCPUFeatures arm_host_cpu_features;
>  
> @@ -244,6 +245,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  ret = -EINVAL;
>  }
>  
> +if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
> +if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
> +warn_report("Failed to enable DABT NISV cap");

Shouldn't this be an error? If KVM says it has KVM_CAP_ARM_NISV_TO_USER,
then I think it should always work to enable it, unless userspace passes
the wrong flags. Currently flags must be zero, but if they were to change
then we'll need to add the flags to vmstate and fail migration when they
aren't compatible, and I guess that failure would occur here.

> +} else {
> +/* Set status for supporting the external dabt injection */
> +cap_has_inject_ext_dabt = kvm_check_extension(s,
> +KVM_CAP_ARM_INJECT_EXT_DABT);
> +}
> +}
> +
>  return ret;
>  }
>  
> @@ -703,9 +714,16 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
>  events.exception.serror_esr = env->serror.esr;
>  }
>  
> +if (cap_has_inject_ext_dabt) {
> +events.exception.ext_dabt_pending = env->ext_dabt_pending;
> +}
> +
>  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
>  if (ret) {
>  error_report("failed to put vcpu events");
> +} else {
> +/* Clear instantly if the call was successful */
> +env->ext_dabt_pending = 0;
>  }
>  
>  return ret;
> @@ -819,6 +837,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
> *run)
>  ret = EXCP_DEBUG;
>  } /* otherwise return to guest */
>  break;
> +case KVM_EXIT_ARM_NISV:
> +/* External DABT with no valid iss to decode */
> +ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
> +   run->arm_nisv.fault_ipa);
> +break;
>  default:
>  qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
>__func__, run->exit_reason);
> @@ -953,3 +976,34 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  {
>  return (data - 32) & 0x;
>  }
> +
> +int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
> + uint64_t fault_ipa)
> +{
> +ARMCPU *cpu = ARM_CPU(cs);
> +CPUARMState *env = >env;
> +
> +   /*
> +* ISS [23:14] is invalid so there is a limited info
> +* on what has just happened so the only *useful* thing that can
> +* be retrieved from ISS is WnR & DFSC (though in some cases WnR
> +* might be less of a value as well)
> +*/
> +
> +/*
> + * Set pending ext dabt and trigger SET_EVENTS so that
> + * KVM can inject the abort
> + */
> +if (cap_has_inject_ext_dabt) {
> +kvm_cpu_synchronize_state(cs);

I guess this is just an expensive 'vcpu_dirty=true', which the comment
above justifies, but not super clearly. Can you try to clarify the
comment some more?  I also wonder if we shouldn't add a KVM function
that just marks a vcpu dirty, rather than fetching all registers.

> +env->ext_dabt_pending = 1;
> +} else {
> +error_report("Data abort exception triggered by guest memory access "
> + "at physical address: 0x"  TARGET_FMT_lx,
> + (target_ulong)fault_ipa);
> +error_printf("KVM unable to emulate faulting instruction.\n");
> +return -1;
> +}
> +
> +return 0;
> +}
> diff --git a/target/arm/kvm_arm.h 

Re: [PATCH v5 20/23] KVM: arm64: GICv4.1: Plumb SGI implementation selection in the distributor

2020-03-23 Thread Zenghui Yu

Hi Marc,

On 2020/3/23 16:25, Marc Zyngier wrote:

Hi Zenghui,

[...]


And actually, maybe we can handle that pretty cheaply. If userspace
tries to restore GICD_TYPER2 to a value that isn't what KVM can
offer, we just return an error. Exactly like we do for GICD_IIDR.
Hence the following patch:

diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c 
b/virt/kvm/arm/vgic/vgic-mmio-v3.c

index 28b639fd1abc..e72dcc454247 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -156,6 +156,7 @@ static int vgic_mmio_uaccess_write_v3_misc(struct 
kvm_vcpu *vcpu,

  struct vgic_dist *dist = >kvm->arch.vgic;

  switch (addr & 0x0c) {
+    case GICD_TYPER2:
  case GICD_IIDR:
  if (val != vgic_mmio_read_v3_misc(vcpu, addr, len))
  return -EINVAL;

Being a RO register, writing something that isn't compatible with the
possible behaviour of the hypervisor will just return an error.


This is really a nice point to address my concern! I'm happy to see
this in v6 now.



What do you think?


I agreed with you, with a bit nervous though. Some old guests (which
have no knowledge about GICv4.1 vSGIs and don't care about nASSGIcap
at all) will also fail to migrate from A to B, just because now we
present two different (unused) GICD_TYPER2 registers to them.

Is it a little unfair to them :-) ?


I never pretended to be fair! ;-)

I'm happy to prevent migrating from a v4.1 system (A) to a v4.0
system (B). As soon as the guest has run, it isn't safe to do so
(it may have read TYPER2, and now knows about vSGIs). We *could*
track this and find ways to migrate this state as well, but it
feels fragile.

Migrating from B to A is more appealing. It should be possible to
do so without much difficulty (just check that the nASSGIcap bit
is either 0 or equal to KVM's view of the capability).

But overall I seriously doubt we can easily migrate guests across
very different HW. We've been talking about this for years, and
we still don't have a good solution for it given the diversity
of the ecosystem... :-/


Fair enough. Thanks for your detailed explanation!


Zenghui

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 0/2] target/arm: kvm: Support for KVM DABT with no valid ISS

2020-03-23 Thread Beata Michalska
Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now it gives more
control and a start point for more meaningful handling of such cases.

This patchset relies on KVM to insert the external data abort into the guest.

v4:
 - Removing one of the patches as it is being picked-up separately
 target/arm: kvm: Inject events at the last stage of sync
 - Moving handling KVM issue to a separate patch
 - Minor changes wrt the review comments

v3:
 - Fix setting KVM cap per vm not per vcpu
 - Simplifying the handler to bare minimum with no default logging to address
   the potential risk of overflooding the host (adding support for rate
   limiting the logs turned out to be bit too invasive to justify the little
   add-on value from logs in this particular case)
 - Adding handling KVM bug (for small range of affected kernels):
   little bit of trade-off between what's reasonable and what's effective:
   aborting qemu when running on buggy host kernel

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment

Beata Michalska (2):
  target/arm: kvm: Handle DABT with no valid ISS
  target/arm: kvm: Handle potential issue with dabt injection

 target/arm/cpu.h |  3 ++
 target/arm/kvm.c | 82 
 target/arm/kvm32.c   | 25 
 target/arm/kvm64.c   | 34 ++
 target/arm/kvm_arm.h | 21 ++
 5 files changed, 165 insertions(+)

-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 2/2] target/arm: kvm: Handle potential issue with dabt injection

2020-03-23 Thread Beata Michalska
Injecting external data abort through KVM might trigger
an issue on kernels that do not get updated to include the KVM fix.
For those and aarch32 guests, the injected abort gets misconfigured
to be an implementation defined exception. This leads to the guest
repeatedly re-running the faulting instruction.

Add support for handling that case.
[
  Fixed-by: 018f22f95e8a
('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
  Fixed-by: 21aecdbd7f3a
('KVM: arm: Make inject_abt32() inject an external abort instead')
]

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  1 +
 target/arm/kvm.c | 30 +-
 target/arm/kvm32.c   | 25 +
 target/arm/kvm64.c   | 34 ++
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 4f834c1..868afc6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -561,6 +561,7 @@ typedef struct CPUARMState {
 } serror;
 
 uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
+uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */
 
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index c088589..58ad734 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -721,7 +721,12 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
-} else {
+} else if (env->ext_dabt_pending) {
+/*
+ * Mark that the external DABT has been injected,
+ * if one has been requested
+ */
+env->ext_dabt_raised = env->ext_dabt_pending;
 /* Clear instantly if the call was successful */
 env->ext_dabt_pending = 0;
 }
@@ -755,6 +760,29 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+if (unlikely(env->ext_dabt_raised)) {
+/*
+ * Verifying that the ext DABT has been properly injected,
+ * otherwise risking indefinitely re-running the faulting instruction
+ * Covering a very narrow case for kernels 5.5..5.5.4
+ * when injected abort was misconfigured to be
+ * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
+ */
+if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
+unlikely(!kvm_arm_verify_ext_dabt_pending(cs))) {
+
+error_report("Data abort exception with no valid ISS generated by "
+   "guest memory access. KVM unable to emulate faulting "
+   "instruction. Failed to inject an external data abort "
+   "into the guest.");
+abort();
+   }
+   /* Clear the status */
+   env->ext_dabt_raised = 0;
+}
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index f271181..86c4fe7 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -564,3 +564,28 @@ void kvm_arm_pmu_init(CPUState *cs)
 {
 qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
 }
+
+#define ARM_REG_DFSR  ARM_CP15_REG32(0, 5, 0, 0)
+#define ARM_REG_TTBCR ARM_CP15_REG32(0, 2, 0, 2)
+
+#define DFSR_FSC(v)   (((v) >> 6 | (v)) & 0x1F)
+#define DFSC_EXTABT(lpae) (lpae) ? 0x10 : 0x08
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint32_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM_REG_DFSR, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+uint32_t ttbcr;
+int lpae = 0;
+
+if (!kvm_get_one_reg(cs, ARM_REG_TTBCR, )) {
+lpae = arm_feature(env, ARM_FEATURE_LPAE) && (ttbcr & TTBCR_EAE);
+}
+return !(DFSR_FSC(dfsr_val) != DFSC_EXTABT(lpae));
+}
+return false;
+}
+
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index be5b31c..18594e9 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -1430,3 +1430,37 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
 
 return false;
 }
+
+#define ARM64_REG_ESR_EL1 ARM64_SYS_REG(3, 0, 5, 2, 0)
+#define ARM64_REG_TCR_EL1 ARM64_SYS_REG(3, 0, 2, 0, 2)
+
+#define ESR_DFSC(aarch64, v)\
+((aarch64) ? ((v) & 0x3F)   \
+   : (((v) >> 6 | (v)) & 0x1F))
+
+#define ESR_DFSC_EXTABT(aarch64, lpae) \
+((aarch64) ? 0x10 : (lpae) ? 0x10 : 0x8)
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint64_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM64_REG_ESR_EL1, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+int aarch64_mode = arm_feature(env, ARM_FEATURE_AARCH64);
+int lpae = 0;
+
+if (!aarch64_mode) {
+uint64_t ttbcr;
+
+

[PATCH v4 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-03-23 Thread Beata Michalska
On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 54 
 target/arm/kvm_arm.h | 11 +++
 3 files changed, 67 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 4ffd991..4f834c1 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -560,6 +560,8 @@ typedef struct CPUARMState {
 uint64_t esr;
 } serror;
 
+uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
+
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 85860e6..c088589 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -244,6 +245,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+warn_report("Failed to enable DABT NISV cap");
+} else {
+/* Set status for supporting the external dabt injection */
+cap_has_inject_ext_dabt = kvm_check_extension(s,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+}
+
 return ret;
 }
 
@@ -703,9 +714,16 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 events.exception.serror_esr = env->serror.esr;
 }
 
+if (cap_has_inject_ext_dabt) {
+events.exception.ext_dabt_pending = env->ext_dabt_pending;
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
+} else {
+/* Clear instantly if the call was successful */
+env->ext_dabt_pending = 0;
 }
 
 return ret;
@@ -819,6 +837,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
 default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
@@ -953,3 +976,34 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return (data - 32) & 0x;
 }
+
+int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+   /*
+* ISS [23:14] is invalid so there is a limited info
+* on what has just happened so the only *useful* thing that can
+* be retrieved from ISS is WnR & DFSC (though in some cases WnR
+* might be less of a value as well)
+*/
+
+/*
+ * Set pending ext dabt and trigger SET_EVENTS so that
+ * KVM can inject the abort
+ */
+if (cap_has_inject_ext_dabt) {
+kvm_cpu_synchronize_state(cs);
+env->ext_dabt_pending = 1;
+} else {
+error_report("Data abort exception triggered by guest memory access "
+ "at physical address: 0x"  TARGET_FMT_lx,
+ (target_ulong)fault_ipa);
+error_printf("KVM unable to emulate faulting instruction.\n");
+return -1;
+}
+
+return 0;
+}
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index ae9e075..39472d5 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -450,6 +450,17 @@ struct kvm_guest_debug_arch;
 void kvm_arm_copy_hw_debug_data(struct kvm_guest_debug_arch *ptr);
 
 /**
+ * kvm_arm_handle_dabt_nisv:
+ * @cs: CPUState
+ * @esr_iss: ISS encoding (limited) for the exception from Data Abort
+ *   ISV bit set to '0b0' -> no valid instruction syndrome
+ * @fault_ipa: faulting address for the synch data abort
+ *
+ * Returns: 0 if the exception has been handled, < 0 otherwise
+ */
+int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa);
+/**
  * its_class_name:
  *
  * Return the ITS class name to use depending on whether KVM acceleration
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Add support for IDSR exits to userspace

2020-03-23 Thread Lev Aronsky
On Mon, Mar 23, 2020 at 10:26:18AM +, Marc Zyngier wrote:
> On 2020-03-23 09:41, Lev Aronsky wrote:
> > On Mon, Mar 23, 2020 at 09:07:12AM +, Marc Zyngier wrote:
> > > On 2020-03-23 08:22, Lev Aronsky wrote:

[...]

> > > 
> > > I'm pretty sure this wouldn't work with HW virtualization. I suspect
> > > this would UNDEF directly on the CPU, leading to an exception being
> > > taken
> > > at EL1 without intervention of the hypervisor. Which makes sense as
> > > you'd
> > > be executing an instruction that the CPU really doesn't implement.
> > 
> > Yes, that seems to be what's happening. We'll have to think of a
> > different mechanism for trapping access from user-mode straight to the
> > hypervisor - or, alternatively, move our custom code into the kernel. I
> > know it's a bit off-topic, but thank you for your advice!
> 
> One possibility would be trap accesses to a special page (magic device?),
> but that requires cooperation from the OS kernel as well. There is hardly
> anything else that would guarantee a trap directly from EL0 to EL2 (EL1
> can always get in the way).

These are the times I miss the simplicity of CPUID and VMCALL/VMMCALL on
x86... A special page might work - we are already doing some minor
patches in the kernel, adding a single EL0-accessible page might be
the way to go. Thanks.

> 
> M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Add support for IDSR exits to userspace

2020-03-23 Thread Marc Zyngier

On 2020-03-23 09:41, Lev Aronsky wrote:

On Mon, Mar 23, 2020 at 09:07:12AM +, Marc Zyngier wrote:

On 2020-03-23 08:22, Lev Aronsky wrote:


[...]


> We're running it on an ARM cloud server (we were hoping to be able to
> use SBCs for the project, but iOS uses 16K pages for kernel mode, and we
> found out (the hard way) that most older/cheaper ARM cores don't support
> it (Cortex A76 being the first one to support it, IIRC).

I think there is more than just A76. A55 definitely has TGran16, as 
well as
A73, A75, A65 (I've stopped looking in the various TRMs). So 
definitely

in the realm of SBCs (I have a quad A55 on my desk, worth $70).


You're right, as usual.


Funny. My wife says otherwise... ;-)


But A55 SBCs are apparently hard to find - most
of the SBCs I've seen are A53/A72 (which I thought would be enough,
until we found out about the TGran16 problem), and now that I looked 
for

A55-based SBCs, I couldn't find one with a big enough memory (we're
looking at 4GB+, so that we can provide the VM at least 2GB and still
have adequate performance).


Yeah, decent machines are hard to find :-(. Some TV boxes ship with 4GB,
but that'd be a waste a time (and money). If the Windows-ARM laptops
allowed to run EL2 code, they'd be great... I guess you're going to be
stuck with your cloud machine for a long while.


> > > Interestingly, EL0 access to implementation-defined registers currently
> > > results in an UNDEF, even though I expected it to be passed on to our
> > > handler (I saw this behavior with a custom system register we defined
> > > for direct communication with the hypervisor from a user-mode program we
> > > developed). I tried following the ARM documentation to figure out what
> > > could cause such a behavior, but so far I'm at a loss.
> >
> > Here's your answer:
> >
> > "When the value of HCR_EL2.TIDCP is 1, it is IMPLEMENTATION DEFINED
> > whether
> > any of this functionality accessed from EL0 is trapped to EL2. If it
> > is not,
> > then it is UNDEFINED, and any attempt to access it from EL0
> > generates an
> > exception that is taken to EL1."
> >
> > Also, I don't really understand how you define a custom system
> > register.
> > Unless you're writing the HW as well, of course.
>
> We are using QEMU as the hypervisor. QEMU allows for definition of
> arbitrary system registers (based on opc0/opc1/opc2/crm/crn), with
> custom read/write callback functions. We have a custom machine for
> iPhone emulation (you can take a look at our code at
> https://github.com/alephsecurity/xnu-qemu-arm64, if you're interested),
> so yeah - you could say we're writing the hardware, as well.

I'm pretty sure this wouldn't work with HW virtualization. I suspect
this would UNDEF directly on the CPU, leading to an exception being 
taken
at EL1 without intervention of the hypervisor. Which makes sense as 
you'd

be executing an instruction that the CPU really doesn't implement.


Yes, that seems to be what's happening. We'll have to think of a
different mechanism for trapping access from user-mode straight to the
hypervisor - or, alternatively, move our custom code into the kernel. I
know it's a bit off-topic, but thank you for your advice!


One possibility would be trap accesses to a special page (magic 
device?),
but that requires cooperation from the OS kernel as well. There is 
hardly

anything else that would guarantee a trap directly from EL0 to EL2 (EL1
can always get in the way).

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Add support for IDSR exits to userspace

2020-03-23 Thread Lev Aronsky
On Mon, Mar 23, 2020 at 09:07:12AM +, Marc Zyngier wrote:
> On 2020-03-23 08:22, Lev Aronsky wrote:
> > On Sun, Mar 22, 2020 at 05:29:52PM +, Marc Zyngier wrote:
> > > On 2020-03-22 14:20, Lev Aronsky wrote:
> > > > On Sun, Mar 22, 2020 at 11:34:35AM +, Marc Zyngier wrote:
> > > > > Hi Lev,
> > > > >
> > > > > Thanks for this.
> > > > >
> > > > > On 2020-03-22 09:36, aron...@gmail.com wrote:
> > > > > > From: Lev Aronsky 
> > > > > >
> > > > > > As per ARM DDI 0487E.a, section D12.3.2, there can be various
> > > > > > system registers that are IMPLEMENTATION DEFINED.
> > > > > >
> > > > > > So far, KVM would inject an undefined instruction into the guest,
> > > > > > whenever access to an implementation defined system register (from
> > > > > > here on referred to as IDSR) was trapped. This makes sense, since a
> > > > > > general-purpose OS probably shouldn't rely on the existence of 
> > > > > > IDSRs.
> > > > > >
> > > > > > This patch introduces an option to give userspace a chance to handle
> > > > > > these traps. This can be used to emulate specific architectures, and
> > > > > > virtualize operating systems that rely on the existence of specific
> > > > > > IDSRs.
> > > > >
> > > > > Do you have an example of such operating systems? Also, do you have
> > > > > an example where userspace could actually do something useful about
> > > > > such access, other than maybe treating it as RAZ/WI?
> > > >
> > > > I ran into the issue when working on our company's project, which aims
> > > > to emulate Apple's iOS under QEMU. While emulation currently works
> > > > nicely, we were looking to improve performance by using hardware
> > > > virtualization, and that's when we ran into the issue of
> > > > implementation-defined system registers, since iOS uses those. Frankly,
> > > > in our case, we don't really know the purpose of those registers, as far
> > > > as the iOS kernel is concerned. When emulating them, we treat them as
> > > > simple 64-bit storage areas. It's possible that treating them as RAZ/WI
> > > > would work, as well.
> > > 
> > > It's not really reassuring, is it? :-/
> > 
> > It's not perfect, but also not too bad - the system is booting as
> > expected. It might not be a 100% perfect emulation of a real device
> > without a proper implementation of those registers, but it's good enough
> > in our case.
> 
> Hum. OK, I guess.
> 
> > > > > > Similarly to the recently introduced NISV exits, this is an ABI to
> > > > > > userspace, and comes with a matching new capability that allows the
> > > > > > configuration of the behavior.
> > > > >
> > > > > These are different issues: one is a shortcoming of the architecture,
> > > > > the other a shortcoming (or hyperspecialization) of the guest OS.
> > > >
> > > > You're correct. I mentioned the NISV exits because my code is modeled
> > > > after them, and was hoping it would make following my changes easier.
> > > 
> > > It does. IT is just that wou did seem to conflate the two things,
> > > which
> > > triggered an allergic reaction... ;-)
> > > 
> > > > > > This patch introduces KVM_CAP_ARM_IDSR_TO_USER, which can be used to
> > > > > > enable userspace exits due to access to IDSRs. Additionally, it
> > > > > > introduces a matching new exit reason, KVM_EXIT_ARM_IDSR, to report
> > > > > > those events to userspace. Userspace can choose to emulate the 
> > > > > > access
> > > > > > based on the architecture requirements, or refuse to emulate it and
> > > > >
> > > > > s/architecture/implementation/
> > > >
> > > > Sorry for the newbie question - but how do I change the description now?
> > > > I will upload an updated patch, based on your following comments, but
> > > > I'm not sure how to change the original commit description the correct
> > > > way.
> > > 
> > > git commit --amend
> > 
> > Thanks. I know about amending commits - I was wondering about what to do
> > with the amended commit. As far as I understand, given there are
> > additional changes I'll have to make, I'd have to make a v2 of the
> > patch, but according to the guides I've seen, I would only add the
> > changelog of the newer version of the patch, so I wasn't sure whether to
> > make the correction in the original description of the patch or not.
> 
> In general, the commit log must be accurate, so you are free to change
> it at any time to reflect what the patch does.
> 
> > 
> > > >
> > > > > > let the kernel continue with the default behavior (injecting an
> > > > > > undefined instruction exception into the guest).
> > > > > >
> > > > > > Signed-off-by: Lev Aronsky 
> > > > > > ---
> > > > > >  arch/arm64/include/asm/kvm_coproc.h |  1 +
> > > > > >  arch/arm64/include/asm/kvm_host.h   |  6 +++
> > > > > >  arch/arm64/kvm/sys_regs.c   | 66 
> > > > > > -
> > > > > >  include/uapi/linux/kvm.h| 14 ++
> > > > > >  tools/include/uapi/linux/kvm.h  | 14 ++
> > > > > >  virt/kvm/arm/arm.c  

Re: [PATCH] KVM: arm64: Add support for IDSR exits to userspace

2020-03-23 Thread Marc Zyngier

On 2020-03-23 08:22, Lev Aronsky wrote:

On Sun, Mar 22, 2020 at 05:29:52PM +, Marc Zyngier wrote:

On 2020-03-22 14:20, Lev Aronsky wrote:
> On Sun, Mar 22, 2020 at 11:34:35AM +, Marc Zyngier wrote:
> > Hi Lev,
> >
> > Thanks for this.
> >
> > On 2020-03-22 09:36, aron...@gmail.com wrote:
> > > From: Lev Aronsky 
> > >
> > > As per ARM DDI 0487E.a, section D12.3.2, there can be various
> > > system registers that are IMPLEMENTATION DEFINED.
> > >
> > > So far, KVM would inject an undefined instruction into the guest,
> > > whenever access to an implementation defined system register (from
> > > here on referred to as IDSR) was trapped. This makes sense, since a
> > > general-purpose OS probably shouldn't rely on the existence of IDSRs.
> > >
> > > This patch introduces an option to give userspace a chance to handle
> > > these traps. This can be used to emulate specific architectures, and
> > > virtualize operating systems that rely on the existence of specific
> > > IDSRs.
> >
> > Do you have an example of such operating systems? Also, do you have
> > an example where userspace could actually do something useful about
> > such access, other than maybe treating it as RAZ/WI?
>
> I ran into the issue when working on our company's project, which aims
> to emulate Apple's iOS under QEMU. While emulation currently works
> nicely, we were looking to improve performance by using hardware
> virtualization, and that's when we ran into the issue of
> implementation-defined system registers, since iOS uses those. Frankly,
> in our case, we don't really know the purpose of those registers, as far
> as the iOS kernel is concerned. When emulating them, we treat them as
> simple 64-bit storage areas. It's possible that treating them as RAZ/WI
> would work, as well.

It's not really reassuring, is it? :-/


It's not perfect, but also not too bad - the system is booting as
expected. It might not be a 100% perfect emulation of a real device
without a proper implementation of those registers, but it's good 
enough

in our case.


Hum. OK, I guess.


> > > Similarly to the recently introduced NISV exits, this is an ABI to
> > > userspace, and comes with a matching new capability that allows the
> > > configuration of the behavior.
> >
> > These are different issues: one is a shortcoming of the architecture,
> > the other a shortcoming (or hyperspecialization) of the guest OS.
>
> You're correct. I mentioned the NISV exits because my code is modeled
> after them, and was hoping it would make following my changes easier.

It does. IT is just that wou did seem to conflate the two things, 
which

triggered an allergic reaction... ;-)

> > > This patch introduces KVM_CAP_ARM_IDSR_TO_USER, which can be used to
> > > enable userspace exits due to access to IDSRs. Additionally, it
> > > introduces a matching new exit reason, KVM_EXIT_ARM_IDSR, to report
> > > those events to userspace. Userspace can choose to emulate the access
> > > based on the architecture requirements, or refuse to emulate it and
> >
> > s/architecture/implementation/
>
> Sorry for the newbie question - but how do I change the description now?
> I will upload an updated patch, based on your following comments, but
> I'm not sure how to change the original commit description the correct
> way.

git commit --amend


Thanks. I know about amending commits - I was wondering about what to 
do

with the amended commit. As far as I understand, given there are
additional changes I'll have to make, I'd have to make a v2 of the
patch, but according to the guides I've seen, I would only add the
changelog of the newer version of the patch, so I wasn't sure whether 
to

make the correction in the original description of the patch or not.


In general, the commit log must be accurate, so you are free to change
it at any time to reflect what the patch does.




>
> > > let the kernel continue with the default behavior (injecting an
> > > undefined instruction exception into the guest).
> > >
> > > Signed-off-by: Lev Aronsky 
> > > ---
> > >  arch/arm64/include/asm/kvm_coproc.h |  1 +
> > >  arch/arm64/include/asm/kvm_host.h   |  6 +++
> > >  arch/arm64/kvm/sys_regs.c   | 66 -
> > >  include/uapi/linux/kvm.h| 14 ++
> > >  tools/include/uapi/linux/kvm.h  | 14 ++
> > >  virt/kvm/arm/arm.c  | 11 +
> > >  6 files changed, 111 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_coproc.h
> > > b/arch/arm64/include/asm/kvm_coproc.h
> > > index 0185ee8b8b5e..34b45efffe52 100644
> > > --- a/arch/arm64/include/asm/kvm_coproc.h
> > > +++ b/arch/arm64/include/asm/kvm_coproc.h
> > > @@ -33,6 +33,7 @@ int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct
> > > kvm_run *run);
> > >  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > >  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > >  int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, 

Re: [PATCH] KVM: arm64: Add support for IDSR exits to userspace

2020-03-23 Thread Lev Aronsky
On Sun, Mar 22, 2020 at 05:29:52PM +, Marc Zyngier wrote:
> On 2020-03-22 14:20, Lev Aronsky wrote:
> > On Sun, Mar 22, 2020 at 11:34:35AM +, Marc Zyngier wrote:
> > > Hi Lev,
> > > 
> > > Thanks for this.
> > > 
> > > On 2020-03-22 09:36, aron...@gmail.com wrote:
> > > > From: Lev Aronsky 
> > > >
> > > > As per ARM DDI 0487E.a, section D12.3.2, there can be various
> > > > system registers that are IMPLEMENTATION DEFINED.
> > > >
> > > > So far, KVM would inject an undefined instruction into the guest,
> > > > whenever access to an implementation defined system register (from
> > > > here on referred to as IDSR) was trapped. This makes sense, since a
> > > > general-purpose OS probably shouldn't rely on the existence of IDSRs.
> > > >
> > > > This patch introduces an option to give userspace a chance to handle
> > > > these traps. This can be used to emulate specific architectures, and
> > > > virtualize operating systems that rely on the existence of specific
> > > > IDSRs.
> > > 
> > > Do you have an example of such operating systems? Also, do you have
> > > an example where userspace could actually do something useful about
> > > such access, other than maybe treating it as RAZ/WI?
> > 
> > I ran into the issue when working on our company's project, which aims
> > to emulate Apple's iOS under QEMU. While emulation currently works
> > nicely, we were looking to improve performance by using hardware
> > virtualization, and that's when we ran into the issue of
> > implementation-defined system registers, since iOS uses those. Frankly,
> > in our case, we don't really know the purpose of those registers, as far
> > as the iOS kernel is concerned. When emulating them, we treat them as
> > simple 64-bit storage areas. It's possible that treating them as RAZ/WI
> > would work, as well.
> 
> It's not really reassuring, is it? :-/

It's not perfect, but also not too bad - the system is booting as
expected. It might not be a 100% perfect emulation of a real device
without a proper implementation of those registers, but it's good enough
in our case.

> > > > Similarly to the recently introduced NISV exits, this is an ABI to
> > > > userspace, and comes with a matching new capability that allows the
> > > > configuration of the behavior.
> > > 
> > > These are different issues: one is a shortcoming of the architecture,
> > > the other a shortcoming (or hyperspecialization) of the guest OS.
> > 
> > You're correct. I mentioned the NISV exits because my code is modeled
> > after them, and was hoping it would make following my changes easier.
> 
> It does. IT is just that wou did seem to conflate the two things, which
> triggered an allergic reaction... ;-)
> 
> > > > This patch introduces KVM_CAP_ARM_IDSR_TO_USER, which can be used to
> > > > enable userspace exits due to access to IDSRs. Additionally, it
> > > > introduces a matching new exit reason, KVM_EXIT_ARM_IDSR, to report
> > > > those events to userspace. Userspace can choose to emulate the access
> > > > based on the architecture requirements, or refuse to emulate it and
> > > 
> > > s/architecture/implementation/
> > 
> > Sorry for the newbie question - but how do I change the description now?
> > I will upload an updated patch, based on your following comments, but
> > I'm not sure how to change the original commit description the correct
> > way.
> 
> git commit --amend

Thanks. I know about amending commits - I was wondering about what to do
with the amended commit. As far as I understand, given there are
additional changes I'll have to make, I'd have to make a v2 of the
patch, but according to the guides I've seen, I would only add the
changelog of the newer version of the patch, so I wasn't sure whether to
make the correction in the original description of the patch or not.

> > 
> > > > let the kernel continue with the default behavior (injecting an
> > > > undefined instruction exception into the guest).
> > > >
> > > > Signed-off-by: Lev Aronsky 
> > > > ---
> > > >  arch/arm64/include/asm/kvm_coproc.h |  1 +
> > > >  arch/arm64/include/asm/kvm_host.h   |  6 +++
> > > >  arch/arm64/kvm/sys_regs.c   | 66 -
> > > >  include/uapi/linux/kvm.h| 14 ++
> > > >  tools/include/uapi/linux/kvm.h  | 14 ++
> > > >  virt/kvm/arm/arm.c  | 11 +
> > > >  6 files changed, 111 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/arm64/include/asm/kvm_coproc.h
> > > > b/arch/arm64/include/asm/kvm_coproc.h
> > > > index 0185ee8b8b5e..34b45efffe52 100644
> > > > --- a/arch/arm64/include/asm/kvm_coproc.h
> > > > +++ b/arch/arm64/include/asm/kvm_coproc.h
> > > > @@ -33,6 +33,7 @@ int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct
> > > > kvm_run *run);
> > > >  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > > >  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > > >  int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct 

Re: [PATCH v5 20/23] KVM: arm64: GICv4.1: Plumb SGI implementation selection in the distributor

2020-03-23 Thread Marc Zyngier

Hi Zenghui,

[...]


And actually, maybe we can handle that pretty cheaply. If userspace
tries to restore GICD_TYPER2 to a value that isn't what KVM can
offer, we just return an error. Exactly like we do for GICD_IIDR.
Hence the following patch:

diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c 
b/virt/kvm/arm/vgic/vgic-mmio-v3.c

index 28b639fd1abc..e72dcc454247 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -156,6 +156,7 @@ static int vgic_mmio_uaccess_write_v3_misc(struct 
kvm_vcpu *vcpu,

  struct vgic_dist *dist = >kvm->arch.vgic;

  switch (addr & 0x0c) {
+    case GICD_TYPER2:
  case GICD_IIDR:
  if (val != vgic_mmio_read_v3_misc(vcpu, addr, len))
  return -EINVAL;

Being a RO register, writing something that isn't compatible with the
possible behaviour of the hypervisor will just return an error.


This is really a nice point to address my concern! I'm happy to see
this in v6 now.



What do you think?


I agreed with you, with a bit nervous though. Some old guests (which
have no knowledge about GICv4.1 vSGIs and don't care about nASSGIcap
at all) will also fail to migrate from A to B, just because now we
present two different (unused) GICD_TYPER2 registers to them.

Is it a little unfair to them :-) ?


I never pretended to be fair! ;-)

I'm happy to prevent migrating from a v4.1 system (A) to a v4.0
system (B). As soon as the guest has run, it isn't safe to do so
(it may have read TYPER2, and now knows about vSGIs). We *could*
track this and find ways to migrate this state as well, but it
feels fragile.

Migrating from B to A is more appealing. It should be possible to
do so without much difficulty (just check that the nASSGIcap bit
is either 0 or equal to KVM's view of the capability).

But overall I seriously doubt we can easily migrate guests across
very different HW. We've been talking about this for years, and
we still don't have a good solution for it given the diversity
of the ecosystem... :-/

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 20/23] KVM: arm64: GICv4.1: Plumb SGI implementation selection in the distributor

2020-03-23 Thread Zenghui Yu

Hi Marc,

On 2020/3/20 17:01, Marc Zyngier wrote:

Hi Zenghui,

On 2020-03-20 03:53, Zenghui Yu wrote:

Hi Marc,

On 2020/3/19 20:10, Marc Zyngier wrote:

But I wonder that should we use nassgireq to *only* keep track what
the guest had written into the GICD_CTLR.nASSGIreq.  If not, we may
lose the guest-request bit after migration among hosts with different
has_gicv4_1 settings.


I'm unsure of what you're suggesting here. If userspace tries to set
GICD_CTLR.nASSGIreq on a non-4.1 host, this bit will not latch.


This is exactly what I *was* concerning about.


Userspace can check that at restore time. Or we could fail the
userspace write, which is a bit odd (the bit is otherwise RES0).

Could you clarify your proposal?


Let's assume two hosts below. 'has_gicv4_1' is true on host-A while
it is false on host-B because of lack of HW support or the kernel
parameter "kvm-arm.vgic_v4_enable=0".

If we migrate guest through A->B->A, we may end-up lose the initial
guest-request "nASSGIreq=1" and don't use direct vSGI delivery for
this guest when it's migrated back to host-A.


My point above is that we shouldn't allow the A->B migration the first
place, and fail it as quickly as possible. We don't know what the guest
has observed in terms of GIC capability, and it may not have enabled the
new flavour of SGIs just yet.


Indeed. I didn't realize it.




This can be "fixed" by keep track of what guest had written into
nASSGIreq. And we need to evaluate the need for using direct vSGI
for a specified guest by 'has_gicv4_1 && nassgireq'.


It feels odd. It means we have more state than the HW normally has.
I have an alternative proposal, see below.


But if it's expected that "if userspace tries to set nASSGIreq on
a non-4.1 host, this bit will not latch", then this shouldn't be
a problem at all.


Well, that is the semantics of the RES0 bit. It applies from both
guest and userspace.

And actually, maybe we can handle that pretty cheaply. If userspace
tries to restore GICD_TYPER2 to a value that isn't what KVM can
offer, we just return an error. Exactly like we do for GICD_IIDR.
Hence the following patch:

diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c 
b/virt/kvm/arm/vgic/vgic-mmio-v3.c

index 28b639fd1abc..e72dcc454247 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -156,6 +156,7 @@ static int vgic_mmio_uaccess_write_v3_misc(struct 
kvm_vcpu *vcpu,

  struct vgic_dist *dist = >kvm->arch.vgic;

  switch (addr & 0x0c) {
+    case GICD_TYPER2:
  case GICD_IIDR:
  if (val != vgic_mmio_read_v3_misc(vcpu, addr, len))
  return -EINVAL;

Being a RO register, writing something that isn't compatible with the
possible behaviour of the hypervisor will just return an error.


This is really a nice point to address my concern! I'm happy to see
this in v6 now.



What do you think?


I agreed with you, with a bit nervous though. Some old guests (which
have no knowledge about GICv4.1 vSGIs and don't care about nASSGIcap
at all) will also fail to migrate from A to B, just because now we
present two different (unused) GICD_TYPER2 registers to them.

Is it a little unfair to them :-) ?


Thanks,
Zenghui

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm