Re: [PATCH v10 5/5] arm64: KVM: Enable support for :G/:H perf event modifiers

2019-02-18 Thread Christoffer Dall
On Mon, Jan 14, 2019 at 04:11:48PM +, Andrew Murray wrote:
> Enable/disable event counters as appropriate when entering and exiting
> the guest to enable support for guest or host only event counting.
> 
> For both VHE and non-VHE we switch the counters between host/guest at
> EL2. EL2 is filtered out by the PMU when we are using the :G modifier.

I don't think the last part is strictly true as per the former patch on
a non-vhe system if you have the :h modifier, so maybe just leave that
out of the commit message.

> 
> The PMU may be on when we change which counters are enabled however
> we avoid adding an isb as we instead rely on existing context
> synchronisation events: the isb in kvm_arm_vhe_guest_exit for VHE and
> the eret from the hvc in kvm_call_hyp.
> 
> Signed-off-by: Andrew Murray 
> Reviewed-by: Suzuki K Poulose 
> ---
>  arch/arm64/kvm/hyp/switch.c | 60 
> +
>  1 file changed, 60 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index b0b1478..9018fb3 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -357,6 +357,54 @@ static bool __hyp_text __hyp_switch_fpsimd(struct 
> kvm_vcpu *vcpu)
>   return true;
>  }
>  
> +static bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context 
> *host_ctxt)
> +{
> + struct kvm_host_data *host;
> + struct kvm_pmu_events *pmu;
> + u32 clr, set;
> +
> + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
> + pmu = &host->pmu_events;
> +
> + /* We can potentially avoid a sysreg write by only changing bits that
> +  * differ between the guest/host. E.g. where events are enabled in
> +  * both guest and host
> +  */

super nit: kernel coding style requires 'wings' on both side of a
multi-line comment.  Only if you respin anyhow.

> + clr = pmu->events_host & ~pmu->events_guest;
> + set = pmu->events_guest & ~pmu->events_host;
> +
> + if (clr)
> + write_sysreg(clr, pmcntenclr_el0);
> +
> + if (set)
> + write_sysreg(set, pmcntenset_el0);
> +
> + return (clr || set);
> +}
> +
> +static void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context 
> *host_ctxt)
> +{
> + struct kvm_host_data *host;
> + struct kvm_pmu_events *pmu;
> + u32 clr, set;
> +
> + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
> + pmu = &host->pmu_events;
> +
> + /* We can potentially avoid a sysreg write by only changing bits that
> +  * differ between the guest/host. E.g. where events are enabled in
> +  * both guest and host
> +  */

ditto

> + clr = pmu->events_guest & ~pmu->events_host;
> + set = pmu->events_host & ~pmu->events_guest;
> +
> + if (clr)
> + write_sysreg(clr, pmcntenclr_el0);
> +
> + if (set)
> + write_sysreg(set, pmcntenset_el0);
> +}
> +
>  /*
>   * Return true when we were able to fixup the guest exit and should return to
>   * the guest, false when we should restore the host state and return to the
> @@ -464,12 +512,15 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  {
>   struct kvm_cpu_context *host_ctxt;
>   struct kvm_cpu_context *guest_ctxt;
> + bool pmu_switch_needed;
>   u64 exit_code;
>  
>   host_ctxt = vcpu->arch.host_cpu_context;
>   host_ctxt->__hyp_running_vcpu = vcpu;
>   guest_ctxt = &vcpu->arch.ctxt;
>  
> + pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
> +
>   sysreg_save_host_state_vhe(host_ctxt);
>  
>   /*
> @@ -511,6 +562,9 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  
>   __debug_switch_to_host(vcpu);
>  
> + if (pmu_switch_needed)
> + __pmu_switch_to_host(host_ctxt);
> +
>   return exit_code;
>  }
>  
> @@ -519,6 +573,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  {
>   struct kvm_cpu_context *host_ctxt;
>   struct kvm_cpu_context *guest_ctxt;
> + bool pmu_switch_needed;
>   u64 exit_code;
>  
>   vcpu = kern_hyp_va(vcpu);
> @@ -527,6 +582,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>   host_ctxt->__hyp_running_vcpu = vcpu;
>   guest_ctxt = &vcpu->arch.ctxt;
>  
> + pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
> +
>   __sysreg_save_state_nvhe(host_ctxt);
>  
>   __activate_vm(kern_hyp_va(vcpu->kvm));
> @@ -573,6 +630,9 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>*/
>   __debug_switch_to_host(vcpu);
>  
> + if (pmu_switch_needed)
> + __pmu_switch_to_host(host_ctxt);
> +
>   return exit_code;
>  }
>  
> -- 
> 2.7.4
> 

Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v10 4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2019-02-18 Thread Christoffer Dall
On Mon, Jan 14, 2019 at 04:11:47PM +, Andrew Murray wrote:
> Add support for the :G and :H attributes in perf by handling the
> exclude_host/exclude_guest event attributes.
> 
> We notify KVM of counters that we wish to be enabled or disabled on
> guest entry/exit and thus defer from starting or stopping :G events
> as per the events exclude_host attribute.
> 
> With both VHE and non-VHE we switch the counters between host/guest
> at EL2. We are able to eliminate counters counting host events on
> the boundaries of guest entry/exit when using :G by filtering out
> EL2 for exclude_host. However when using :H unless exclude_hv is set
> on non-VHE then there is a small blackout window at the guest
> entry/exit where host events are not captured.
> 
> Signed-off-by: Andrew Murray 
> Reviewed-by: Suzuki K Poulose 
> ---
>  arch/arm64/kernel/perf_event.c | 53 
> --
>  1 file changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 1c71796..21c6831 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -26,6 +26,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -528,11 +529,27 @@ static inline int armv8pmu_enable_counter(int idx)
>  
>  static inline void armv8pmu_enable_event_counter(struct perf_event *event)
>  {
> + struct perf_event_attr *attr = &event->attr;
>   int idx = event->hw.idx;
> + int flags = 0;
> + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
>  
> - armv8pmu_enable_counter(idx);
>   if (armv8pmu_event_is_chained(event))
> - armv8pmu_enable_counter(idx - 1);
> + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> +
> + if (!attr->exclude_host)
> + flags |= KVM_PMU_EVENTS_HOST;
> + if (!attr->exclude_guest)
> + flags |= KVM_PMU_EVENTS_GUEST;
> +
> + kvm_set_pmu_events(counter_bits, flags);
> +
> + /* We rely on the hypervisor switch code to enable guest counters */
> + if (!attr->exclude_host) {
> + armv8pmu_enable_counter(idx);
> + if (armv8pmu_event_is_chained(event))
> + armv8pmu_enable_counter(idx - 1);
> + }
>  }
>  
>  static inline int armv8pmu_disable_counter(int idx)
> @@ -545,11 +562,21 @@ static inline int armv8pmu_disable_counter(int idx)
>  static inline void armv8pmu_disable_event_counter(struct perf_event *event)
>  {
>   struct hw_perf_event *hwc = &event->hw;
> + struct perf_event_attr *attr = &event->attr;
>   int idx = hwc->idx;
> + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
>  
>   if (armv8pmu_event_is_chained(event))
> - armv8pmu_disable_counter(idx - 1);
> - armv8pmu_disable_counter(idx);
> + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> +
> + kvm_clr_pmu_events(counter_bits);
> +
> + /* We rely on the hypervisor switch code to disable guest counters */
> + if (!attr->exclude_host) {
> + if (armv8pmu_event_is_chained(event))
> + armv8pmu_disable_counter(idx - 1);
> + armv8pmu_disable_counter(idx);
> + }
>  }
>  
>  static inline int armv8pmu_enable_intens(int idx)
> @@ -824,16 +851,25 @@ static int armv8pmu_set_event_filter(struct 
> hw_perf_event *event,
>* Therefore we ignore exclude_hv in this configuration, since
>* there's no hypervisor to sample anyway. This is consistent
>* with other architectures (x86 and Power).
> +  *
> +  * To eliminate counting host events on the boundaries of
   ^comma

> +  * guest entry/exit we ensure EL2 is not included in hyp mode
   ^comma (or rework sentence)

What do you mean by "EL2 is not included in hyp mode" ??

> +  * with !exclude_host.
>*/
>   if (is_kernel_in_hyp_mode()) {
> - if (!attr->exclude_kernel)
> + if (!attr->exclude_kernel && !attr->exclude_host)
>   config_base |= ARMV8_PMU_INCLUDE_EL2;
>   } else {
> - if (attr->exclude_kernel)
> - config_base |= ARMV8_PMU_EXCLUDE_EL1;
>   if (!attr->exclude_hv)
>   config_base |= ARMV8_PMU_INCLUDE_EL2;
>   }
> +
> + /*
> +  * Filter out !VHE kernels and guest kernels
> +  */
> + if (attr->exclude_kernel)
> + config_base |= ARMV8_PMU_EXCLUDE_EL1;
> +

Let me see if I get this right:

exclude_user:   VHE: Don't count EL0
Non-VHE: Don't count EL0

exclude_kernel: VHE: Don't count EL2 and don't count EL1
Non-VHE: Don't count EL1

exclude_hv: VHE: No effect
Non-VHE: Don't count EL2

exclude_host:   VHE: Don't count EL2 + enable/disable on guest 
entry/exit
 

Re: [PATCH] KVM: arm/arm64: arch_timer: Mark physical interrupt active when a virtual interrupt is pending

2019-02-12 Thread Christoffer Dall
On Fri, Feb 08, 2019 at 02:43:00PM +, Marc Zyngier wrote:
> When a guest gets scheduled, KVM performs a "load" operation,
> which for the timer includes evaluating the virtual "active" state
> of the interrupt, and replicating it on the physical side. This
> ensures that the deactivation in the guest will also take place
> in the physical GIC distributor.
> 
> If the interrupt is not yet active, we flag it as inactive on the
> physical side.  This means that on restoring the timer registers,
> if the timer has expired, we'll immediately take an interrupt.
> That's absolutely fine, as the interrupt will then be flagged as
> active on the physical side. What this assumes though is that we'll
> enter the guest right after having taken the interrupt, and that
> the guest will quickly ACK the interrupt, making it active at on
> the virtual side.
> 
> It turns out that quite often, this assumption doesn't really hold.
> The guest may be preempted on the back on this interrupt, either
> from kernel space or whilst running at EL1 when a host interrupt
> fires. When this happens, we repeat the whole sequence on the
> next load (interrupt marked as inactive, timer registers restored,
> interrupt fires). And if it takes a really long time for a guest
> to activate the interrupt (as it does with nested virt), we end-up
> with many such events in quick succession, leading to the guest only
> making very slow progress.
> 
> This can also be seen with the number of virtual timer interrupt on the
> host being far greater than the same number in the guest.
> 
> An easy way to fix this is to evaluate the timer state when performing
> the "load" operation, just like we do when the interrupt actually fires.
> If the timer has a pending virtual interrupt at this stage, then we
> can safely flag the physical interrupt as being active, which prevents
> spurious exits.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  virt/kvm/arm/arch_timer.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 7449651ae2e5..70c18479ccd5 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -487,12 +487,21 @@ static inline void set_timer_irq_phys_active(struct 
> arch_timer_context *ctx, boo
>  static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
>  {
>   struct kvm_vcpu *vcpu = ctx->vcpu;
> - bool phys_active;
> + bool phys_active = false;
> +
> + /*
> +  * Update the timer output so that it is likely to match the
> +  * state we're about to restore. If the timer expires between
> +  * this point and the register restoration, we'll take the
> +  * interrupt anyway.
> +  */
> + kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx);
>  
>   if (irqchip_in_kernel(vcpu->kvm))
>   phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq);
> - else
> - phys_active = ctx->irq.level;
> +
> + phys_active |= ctx->irq.level;
> +
>   set_timer_irq_phys_active(ctx, phys_active);
>  }
>  
> -- 
> 2.20.1
> 
Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/4] KVM: arm64: Forbid kprobing of the VHE world-switch code

2019-02-01 Thread Christoffer Dall
On Thu, Jan 31, 2019 at 06:53:06PM +, James Morse wrote:
> Hey Christoffer,
> 
> On 31/01/2019 08:08, Christoffer Dall wrote:
> > On Thu, Jan 24, 2019 at 04:32:54PM +, James Morse wrote:
> >> On systems with VHE the kernel and KVM's world-switch code run at the
> >> same exception level. Code that is only used on a VHE system does not
> >> need to be annotated as __hyp_text as it can reside anywhere in the
> >> kernel text.
> >>
> >> __hyp_text was also used to prevent kprobes from patching breakpoint
> >> instructions into this region, as this code runs at a different
> >> exception level. While this is no longer true with VHE, KVM still
> >> switches VBAR_EL1, meaning a kprobe's breakpoint executed in the
> >> world-switch code will cause a hyp-panic.
> > 
> > Forgive potentially very stupid questions here, but:
> > 
> >  (1) Would it make sense to move the save/restore VBAR_EL1 to the last
> >  possible moment, and would that actually allow kprobes to work for
> >  the world-switch code, or does that just result in other weird
> >  problems?
> 
> This would work for taking the debug exception. But next kprobes wants to
> single-step the probed instruction in an out-of-line slot. I don't think we 
> can
> do this if we've already configured the debug hardware for the guest.
> (If could at least turn single-step off when we return to guest-EL0, which
> guest-EL1 was single-stepping)
> 
> 

I suspected something like that, let's not go there.

> >  (2) Are we sure that this catches every call path of every non-inlined
> >  function called after switchign VBAR_EL1?  Can kprobes only be
> >  called on exported symbols, or can you (if you know the address
> >  somehow) put a kprobe on a static function as well.  If there are
> >  any concerns in this area, we might want to consider (1) more
> >  closely.
> 
> Hmmm, good question. The blacklisting applies to whole symbols as seen by
> kallsyms, the compiler has no idea what is going on.
> 
> If it chose not to inline something, it would be kprobe'able yes.
> 
> __kprobes uses a section function-attribute instead. The gcc manual[0] doesn't
> say what happens when inline and the section attributes are used together. (or
> at least I couldn't find it)
> 
> A quick experiment with gcc 8.2.0 shows adding __kprobes on the inlines gets
> discarded when they are inlined. I'm not sure how to trick the compiler into
> not-inlining it to see what happens, but adding 'noinline' to the header file
> causes it to duplicate the function everywhere, but puts it in the __kprobes
> section.
> 
> (For KVM we could use the 'flatten' attribute, but that does say 'if 
> possible'.
> Alternatively we can decorate all the inline helpers we know we use with
> __kprobes as a safety net.)
> 
> I think this is a wider problem with kprobes.
> 

Sounds like it.  Probably in the "you did something crazy, and your
kernel is going to suffer from it" category.

Let's stick to your approach.

Thanks for the explanation.

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/5] arm/arm64: KVM: Allow a VCPU to fully reset itself

2019-01-31 Thread Christoffer Dall
On Thu, Jan 31, 2019 at 06:06:09PM +0100, Andrew Jones wrote:
> On Thu, Jan 31, 2019 at 02:52:11PM +, Marc Zyngier wrote:
> > On 31/01/2019 12:57, Andrew Jones wrote:
> > > On Thu, Jan 31, 2019 at 12:51:56PM +0100, Christoffer Dall wrote:
> > 
> > [...]
> > 
> > >> I don't think there's anything very unconventional here.
> > > 
> > > Normally if a thread observes a change to vcpu->requests, then we ensure a
> > > change to some accompanying data is also observable. We're reversing that
> > > here, which adds a need for additional barriers and a strict request
> > > checking order.
> > > 
> > >>
> > >> Let's try this:  If you have a better way of implementing this, how
> > >> about you write a patch?
> > > 
> > > It would just be this patch minus the unnecessary barriers. I can send it
> > > if you like, but I wouldn't want to change the authorship for such a small
> > > change.
> > 
> > Having these barriers makes it explicit (at least to me) what data we
> > expect to be visible in other threads and in which order. You keep
> > saying that order doesn't matter and we disagree on this. Yes, you've
> > listed cases where we can survive things coming in out of order, but
> > that's not a proof that we don't need them.
> > 
> > So at the end of the day, and unless you can prove that the barriers are
> > not necessary by providing the same form of validation tool, I'm
> > inclined to go with the verified approach.
> 
> I don't know how to compile and run the litmus test, but I'd be happy to
> try if given some pointers.

You can look in tools/memory-model/README as a start.


> If I did know how, I would add vcpu->mode to
> the P1 inputs and some additional lines that look similar to what's in
> "Ensuring Requests Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst
> Even without the litmus test please allow me to try again to describe why
> I think the barriers may be removed.
> 
> Any vcpu we're attempting to power on must be on its way to sleep with a
> SLEEP request, or already be sleeping. This means that it's outside guest
> mode, or will be shortly. If the vcpu observes power_off=false in
> vcpu_req_sleep(), whether it was awaken or never even got to sleep, we
> know that observation is taking place with vcpu->mode != IN_GUEST_MODE.
> 
> We now no longer need to be concerned with the relationship between
> power_off and the RESET vcpu request. 

I disagree.  That argument requires more explanation.

If you set power_off = false before posting the reset
request, then if the VCPU thread is awoken (for any reason) it can run
the VCPU without observing the reset request and that's the problem.

If you are making assumptions about only being woken up as a result of a
reset request, or the interaction with the pause flag, or setting the
sleep request to prevent the guest from executing again, that is a more
complex argument (which you haven't made yet!) and I add that it's a
brittle construction.

What we have here are three pieces of state:

  reset_state->reset
  vcpu->requests
  vcpu->arch.power_state

They must be written to, and the writes must be observed, in that
particular order without any additional assumptions.

You keep arguing that you can enforce an ordering between these three
states with a single barrier which is clearly not possible.

So this boils down to you making additional assumptions (see above,
brittle) without explaining what they are.  I suspect you want this to
fit in your mental model of how vcpu requests solve the world, otherwise
I'm not sure what your concern with this patch, which we all agree is
correct, really is.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/5] arm/arm64: KVM: Allow a VCPU to fully reset itself

2019-01-31 Thread Christoffer Dall
On Thu, Jan 31, 2019 at 01:57:12PM +0100, Andrew Jones wrote:
> On Thu, Jan 31, 2019 at 12:51:56PM +0100, Christoffer Dall wrote:
> > On Thu, Jan 31, 2019 at 11:12:54AM +0100, Andrew Jones wrote:
> > > On Thu, Jan 31, 2019 at 08:43:53AM +0100, Christoffer Dall wrote:
> > > > On Wed, Jan 30, 2019 at 04:27:21PM +0100, Andrew Jones wrote:
> > > > > On Wed, Jan 30, 2019 at 10:34:31AM +0100, Christoffer Dall wrote:
> > > > > > On Tue, Jan 29, 2019 at 05:03:47PM +0100, Andrew Jones wrote:
> > > > > > > On Fri, Jan 25, 2019 at 10:46:53AM +0100, Christoffer Dall wrote:
> > > > > > > > From: Marc Zyngier 
> > > > > > > > 
> > > > > > > > The current kvm_psci_vcpu_on implementation will directly try to
> > > > > > > > manipulate the state of the VCPU to reset it.  However, since 
> > > > > > > > this is
> > > > > > > > not done on the thread that runs the VCPU, we can end up in a 
> > > > > > > > strangely
> > > > > > > > corrupted state when the source and target VCPUs are running at 
> > > > > > > > the same
> > > > > > > > time.
> > > > > > > > 
> > > > > > > > Fix this by factoring out all reset logic from the PSCI 
> > > > > > > > implementation
> > > > > > > > and forwarding the required information along with a request to 
> > > > > > > > the
> > > > > > > > target VCPU.
> > > > > > > 
> > > > > > > The last patch makes more sense, now that I see this one. I guess 
> > > > > > > their
> > > > > > > order should be swapped.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Signed-off-by: Marc Zyngier 
> > > > > > > > Signed-off-by: Christoffer Dall 
> > > > > > > > ---
> > > > > > > >  arch/arm/include/asm/kvm_host.h   | 10 +
> > > > > > > >  arch/arm/kvm/reset.c  | 24 +
> > > > > > > >  arch/arm64/include/asm/kvm_host.h | 11 ++
> > > > > > > >  arch/arm64/kvm/reset.c| 24 +
> > > > > > > >  virt/kvm/arm/arm.c| 10 +
> > > > > > > >  virt/kvm/arm/psci.c   | 36 
> > > > > > > > ++-
> > > > > > > >  6 files changed, 95 insertions(+), 20 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/arch/arm/include/asm/kvm_host.h 
> > > > > > > > b/arch/arm/include/asm/kvm_host.h
> > > > > > > > index ca56537b61bc..50e89869178a 100644
> > > > > > > > --- a/arch/arm/include/asm/kvm_host.h
> > > > > > > > +++ b/arch/arm/include/asm/kvm_host.h
> > > > > > > > @@ -48,6 +48,7 @@
> > > > > > > >  #define KVM_REQ_SLEEP \
> > > > > > > > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | 
> > > > > > > > KVM_REQUEST_NO_WAKEUP)
> > > > > > > >  #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
> > > > > > > > +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
> > > > > > > >  
> > > > > > > >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> > > > > > > >  
> > > > > > > > @@ -147,6 +148,13 @@ struct kvm_cpu_context {
> > > > > > > >  
> > > > > > > >  typedef struct kvm_cpu_context kvm_cpu_context_t;
> > > > > > > >  
> > > > > > > > +struct vcpu_reset_state {
> > > > > > > > +   unsigned long   pc;
> > > > > > > > +   unsigned long   r0;
> > > > > > > > +   boolbe;
> > > > > > > > +   boolreset;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > >  struct kvm_vcpu_arch {
> > > > > > > > struct kvm_cpu_context ctxt;
> > > > > > > >  
> > > > > > > >

Re: [PATCH 2/5] arm/arm64: KVM: Allow a VCPU to fully reset itself

2019-01-31 Thread Christoffer Dall
On Thu, Jan 31, 2019 at 11:12:54AM +0100, Andrew Jones wrote:
> On Thu, Jan 31, 2019 at 08:43:53AM +0100, Christoffer Dall wrote:
> > On Wed, Jan 30, 2019 at 04:27:21PM +0100, Andrew Jones wrote:
> > > On Wed, Jan 30, 2019 at 10:34:31AM +0100, Christoffer Dall wrote:
> > > > On Tue, Jan 29, 2019 at 05:03:47PM +0100, Andrew Jones wrote:
> > > > > On Fri, Jan 25, 2019 at 10:46:53AM +0100, Christoffer Dall wrote:
> > > > > > From: Marc Zyngier 
> > > > > > 
> > > > > > The current kvm_psci_vcpu_on implementation will directly try to
> > > > > > manipulate the state of the VCPU to reset it.  However, since this 
> > > > > > is
> > > > > > not done on the thread that runs the VCPU, we can end up in a 
> > > > > > strangely
> > > > > > corrupted state when the source and target VCPUs are running at the 
> > > > > > same
> > > > > > time.
> > > > > > 
> > > > > > Fix this by factoring out all reset logic from the PSCI 
> > > > > > implementation
> > > > > > and forwarding the required information along with a request to the
> > > > > > target VCPU.
> > > > > 
> > > > > The last patch makes more sense, now that I see this one. I guess 
> > > > > their
> > > > > order should be swapped.
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Marc Zyngier 
> > > > > > Signed-off-by: Christoffer Dall 
> > > > > > ---
> > > > > >  arch/arm/include/asm/kvm_host.h   | 10 +
> > > > > >  arch/arm/kvm/reset.c  | 24 +
> > > > > >  arch/arm64/include/asm/kvm_host.h | 11 ++
> > > > > >  arch/arm64/kvm/reset.c| 24 +
> > > > > >  virt/kvm/arm/arm.c| 10 +
> > > > > >  virt/kvm/arm/psci.c   | 36 
> > > > > > ++-
> > > > > >  6 files changed, 95 insertions(+), 20 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/arm/include/asm/kvm_host.h 
> > > > > > b/arch/arm/include/asm/kvm_host.h
> > > > > > index ca56537b61bc..50e89869178a 100644
> > > > > > --- a/arch/arm/include/asm/kvm_host.h
> > > > > > +++ b/arch/arm/include/asm/kvm_host.h
> > > > > > @@ -48,6 +48,7 @@
> > > > > >  #define KVM_REQ_SLEEP \
> > > > > > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > > > > >  #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
> > > > > > +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
> > > > > >  
> > > > > >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> > > > > >  
> > > > > > @@ -147,6 +148,13 @@ struct kvm_cpu_context {
> > > > > >  
> > > > > >  typedef struct kvm_cpu_context kvm_cpu_context_t;
> > > > > >  
> > > > > > +struct vcpu_reset_state {
> > > > > > +   unsigned long   pc;
> > > > > > +   unsigned long   r0;
> > > > > > +   boolbe;
> > > > > > +   boolreset;
> > > > > > +};
> > > > > > +
> > > > > >  struct kvm_vcpu_arch {
> > > > > > struct kvm_cpu_context ctxt;
> > > > > >  
> > > > > > @@ -186,6 +194,8 @@ struct kvm_vcpu_arch {
> > > > > > /* Cache some mmu pages needed inside spinlock regions */
> > > > > > struct kvm_mmu_memory_cache mmu_page_cache;
> > > > > >  
> > > > > > +   struct vcpu_reset_state reset_state;
> > > > > > +
> > > > > > /* Detect first run of a vcpu */
> > > > > > bool has_run_once;
> > > > > >  };
> > > > > > diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
> > > > > > index 5ed0c3ee33d6..de41255eebcd 100644
> > > > > > --- a/arch/arm/kvm/reset.c
> > > > > > +++ b/arch/arm/kvm/reset.c
> > > > > > @@ -26,6 +26,7 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
>

[PATCH v2 4/4] KVM: mips: Move to common kvm_mmu_memcache infrastructure

2019-01-31 Thread Christoffer Dall
Now that we have a common infrastructure for doing MMU cache
allocations, use this for mips as well.

Signed-off-by: Christoffer Dall 
---
 arch/mips/include/asm/kvm_host.h  | 15 ++---
 arch/mips/include/asm/kvm_types.h |  7 
 arch/mips/kvm/mips.c  |  2 +-
 arch/mips/kvm/mmu.c   | 54 ++-
 4 files changed, 20 insertions(+), 58 deletions(-)

diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index d2abd98471e8..e05cabd53a9e 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -293,17 +293,6 @@ struct kvm_mips_tlb {
long tlb_lo[2];
 };
 
-#define KVM_NR_MEM_OBJS 4
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 #define KVM_MIPS_AUX_FPU   0x1
 #define KVM_MIPS_AUX_MSA   0x2
 
@@ -378,7 +367,7 @@ struct kvm_vcpu_arch {
unsigned int last_user_gasid;
 
/* Cache some mmu pages needed inside spinlock regions */
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   struct kvm_mmu_memcache mmu_page_cache;
 
 #ifdef CONFIG_KVM_MIPS_VZ
/* vcpu's vzguestid is different on each host cpu in an smp system */
@@ -915,7 +904,7 @@ void kvm_mips_flush_gva_pt(pgd_t *pgd, enum kvm_mips_flush 
flags);
 bool kvm_mips_flush_gpa_pt(struct kvm *kvm, gfn_t start_gfn, gfn_t end_gfn);
 int kvm_mips_mkclean_gpa_pt(struct kvm *kvm, gfn_t start_gfn, gfn_t end_gfn);
 pgd_t *kvm_pgd_alloc(void);
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu);
 void kvm_trap_emul_invalidate_gva(struct kvm_vcpu *vcpu, unsigned long addr,
  bool user);
 void kvm_trap_emul_gva_lockless_begin(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/include/asm/kvm_types.h 
b/arch/mips/include/asm/kvm_types.h
index 5efeb32a5926..fd8a58534831 100644
--- a/arch/mips/include/asm/kvm_types.h
+++ b/arch/mips/include/asm/kvm_types.h
@@ -2,4 +2,11 @@
 #ifndef _ASM_MIPS_KVM_TYPES_H
 #define _ASM_MIPS_KVM_TYPES_H
 
+#define KVM_ARCH_WANT_MMU_MEMCACHE
+
+#define KVM_MMU_NR_MEMCACHE_OBJS 4
+
+#define KVM_MMU_CACHE_GFP  GFP_KERNEL
+#define KVM_MMU_CACHE_PAGE_GFP GFP_KERNEL
+
 #endif /* _ASM_MIPS_KVM_TYPES_H */
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 3734cd58895e..5ba6905247d3 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -425,7 +425,7 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 
kvm_mips_dump_stats(vcpu);
 
-   kvm_mmu_free_memory_caches(vcpu);
+   kvm_mmu_free_memcaches(vcpu);
kfree(vcpu->arch.guest_ebase);
kfree(vcpu->arch.kseg0_commpage);
kfree(vcpu);
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 97e538a8c1be..aed5284d642e 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -25,41 +25,9 @@
 #define KVM_MMU_CACHE_MIN_PAGES 2
 #endif
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- int min, int max)
+void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu)
 {
-   void *page;
-
-   BUG_ON(max > KVM_NR_MEM_OBJS);
-   if (cache->nobjs >= min)
-   return 0;
-   while (cache->nobjs < max) {
-   page = (void *)__get_free_page(GFP_KERNEL);
-   if (!page)
-   return -ENOMEM;
-   cache->objects[cache->nobjs++] = page;
-   }
-   return 0;
-}
-
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs)
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
-}
-
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
-   void *p;
-
-   BUG_ON(!mc || !mc->nobjs);
-   p = mc->objects[--mc->nobjs];
-   return p;
-}
-
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
-{
-   mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+   kvm_mmu_free_memcache_page(&vcpu->arch.mmu_page_cache);
 }
 
 /**
@@ -133,7 +101,7 @@ pgd_t *kvm_pgd_alloc(void)
  * NULL if a page table doesn't exist for @addr and !@cache.
  * NULL if a page table allocation failed.
  */
-static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct kvm_mmu_memory_cache *cache,
+static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct kvm_mmu_memcache *cache,
unsigned long addr)
 {
pud_t *pud;
@@ -151,7 +119,7 @@ static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct 
kvm_mmu_memory_cache *cache,
 
if (!cache)
return NULL;
-   new_pmd = mmu_memory_cache_alloc(cache);
+   new_pmd = kvm_mmu_memcache_alloc(cache);
pmd_init((unsign

[PATCH v2 2/4] KVM: x86: Rename mmu_memory_cache to kvm_mmu_memcache

2019-01-31 Thread Christoffer Dall
As we have moved the mmu memory cache definitions and functions to
common code, they are exported as symols to the rest of the kernel.

Let's rename the functions and data types to have a kvm_ prefix to make
it clear where these functions belong and take this chance to rename
memory_cache to memcache to avoid overly long lines.

This is a bit tedious on the callsites but ends up looking more
palatable.

Signed-off-by: Christoffer Dall 
---
 arch/x86/include/asm/kvm_host.h  |  6 ++---
 arch/x86/include/asm/kvm_types.h |  4 ++--
 arch/x86/kvm/mmu.c   | 38 
 arch/x86/kvm/paging_tmpl.h   |  4 ++--
 include/linux/kvm_host.h | 14 ++--
 include/linux/kvm_types.h|  6 ++---
 virt/kvm/kvm_main.c  | 14 ++--
 7 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9f97dcd15097..5c12cba8c2b1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -589,9 +589,9 @@ struct kvm_vcpu_arch {
 */
struct kvm_mmu *walk_mmu;
 
-   struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
-   struct kvm_mmu_memory_cache mmu_page_cache;
-   struct kvm_mmu_memory_cache mmu_page_header_cache;
+   struct kvm_mmu_memcache mmu_pte_list_desc_cache;
+   struct kvm_mmu_memcache mmu_page_cache;
+   struct kvm_mmu_memcache mmu_page_header_cache;
 
/*
 * QEMU userspace and the guest each have their own FPU state.
diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_types.h
index 5260d751940e..3ff3b30db52e 100644
--- a/arch/x86/include/asm/kvm_types.h
+++ b/arch/x86/include/asm/kvm_types.h
@@ -2,9 +2,9 @@
 #ifndef _ASM_X86_KVM_TYPES_H
 #define _ASM_X86_KVM_TYPES_H
 
-#define KVM_ARCH_WANT_MMU_MEMORY_CACHE
+#define KVM_ARCH_WANT_MMU_MEMCACHE
 
-#define KVM_NR_MEM_OBJS 40
+#define KVM_MMU_NR_MEMCACHE_OBJS 40
 
 #define KVM_MMU_CACHE_GFP  GFP_KERNEL
 #define KVM_MMU_CACHE_PAGE_GFP GFP_KERNEL_ACCOUNT
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a0bc22f153f1..f1ae118fd5c4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -951,35 +951,35 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu 
*vcpu)
local_irq_enable();
 }
 
-static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
+static int kvm_mmu_topup_memcaches(struct kvm_vcpu *vcpu)
 {
int r;
 
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
+   r = kvm_mmu_topup_memcache(&vcpu->arch.mmu_pte_list_desc_cache,
   pte_list_desc_cache, 8 + PTE_PREFETCH_NUM);
if (r)
goto out;
-   r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
+   r = kvm_mmu_topup_memcache_page(&vcpu->arch.mmu_page_cache, 8);
if (r)
goto out;
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+   r = kvm_mmu_topup_memcache(&vcpu->arch.mmu_page_header_cache,
   mmu_page_header_cache, 4);
 out:
return r;
 }
 
-static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
+static void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu)
 {
-   mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
+   kvm_mmu_free_memcache(&vcpu->arch.mmu_pte_list_desc_cache,
pte_list_desc_cache);
-   mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
-   mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache,
+   kvm_mmu_free_memcache_page(&vcpu->arch.mmu_page_cache);
+   kvm_mmu_free_memcache(&vcpu->arch.mmu_page_header_cache,
mmu_page_header_cache);
 }
 
 static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
 {
-   return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
+   return kvm_mmu_memcache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
 }
 
 static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
@@ -1299,10 +1299,10 @@ static struct kvm_rmap_head *gfn_to_rmap(struct kvm 
*kvm, gfn_t gfn,
 
 static bool rmap_can_add(struct kvm_vcpu *vcpu)
 {
-   struct kvm_mmu_memory_cache *cache;
+   struct kvm_mmu_memcache *cache;
 
cache = &vcpu->arch.mmu_pte_list_desc_cache;
-   return mmu_memory_cache_free_objects(cache);
+   return kvm_mmu_memcache_free_objects(cache);
 }
 
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
@@ -1985,10 +1985,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
 {
struct kvm_mmu_page *sp;
 
-   sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-   sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
+   sp = kvm_mmu_memc

[PATCH v2 0/3] KVM: Unify mmu_memory_cache functionality across architectures

2019-01-31 Thread Christoffer Dall
We currently have duplicated functionality for the mmu_memory_cache used
to pre-allocate memory for the page table manipulation code which cannot
allocate memory while holding spinlocks.  This functionality is
duplicated across x86, arm/arm64, and mips.

There were recently a debate of modifying the arm code to be more in
line with the x86 code and some discussions around changing the page
flags used for allocation.  This series should make it easier to take a
uniform approach across architectures.

While there's not a huge amount of code sharing, we come out with a net
gain.

Only tested on arm/arm64, and only compile-tested on x86 and mips.

Changes since v1:
 - Split out rename from initial x86 patch to have separate patches to
   move the logic to common code and to rename.
 - Introduce KVM_ARCH_WANT_MMU_MEMCACHE to avoid compile breakage on
   architectures that don't use this functionality.
 - Rename KVM_NR_MEM_OBJS to KVM_MMU_NR_MEMCACHE_OBJS

---

Christoffer Dall (4):
  KVM: x86: Move mmu_memory_cache functions to common code
  KVM: x86: Rename mmu_memory_cache to kvm_mmu_memcache
  KVM: arm/arm64: Move to common kvm_mmu_memcache infrastructure
  KVM: mips: Move to common kvm_mmu_memcache infrastructure

 arch/arm/include/asm/kvm_host.h  | 13 +---
 arch/arm/include/asm/kvm_mmu.h   |  2 +-
 arch/arm/include/asm/kvm_types.h | 12 
 arch/arm64/include/asm/kvm_host.h| 13 +---
 arch/arm64/include/asm/kvm_mmu.h |  2 +-
 arch/arm64/include/asm/kvm_types.h   | 13 
 arch/mips/include/asm/kvm_host.h | 15 +
 arch/mips/include/asm/kvm_types.h| 12 
 arch/mips/kvm/mips.c |  2 +-
 arch/mips/kvm/mmu.c  | 54 +++-
 arch/powerpc/include/asm/kvm_types.h |  5 ++
 arch/s390/include/asm/kvm_types.h|  5 ++
 arch/x86/include/asm/kvm_host.h  | 17 +
 arch/x86/include/asm/kvm_types.h | 12 
 arch/x86/kvm/mmu.c   | 97 ++--
 arch/x86/kvm/paging_tmpl.h   |  4 +-
 include/linux/kvm_host.h | 11 
 include/linux/kvm_types.h| 13 
 virt/kvm/arm/arm.c   |  2 +-
 virt/kvm/arm/mmu.c   | 68 +--
 virt/kvm/kvm_main.c  | 60 +
 21 files changed, 202 insertions(+), 230 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_types.h
 create mode 100644 arch/arm64/include/asm/kvm_types.h
 create mode 100644 arch/mips/include/asm/kvm_types.h
 create mode 100644 arch/powerpc/include/asm/kvm_types.h
 create mode 100644 arch/s390/include/asm/kvm_types.h
 create mode 100644 arch/x86/include/asm/kvm_types.h

-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 3/4] KVM: arm/arm64: Move to common kvm_mmu_memcache infrastructure

2019-01-31 Thread Christoffer Dall
Now when we have a common mmu mmemcache implementation, we can reuse
this for arm and arm64.

The common implementation has a slightly different behavior when
allocating objects under high memory pressure; whereas the current
arm/arm64 implementation will give up and return -ENOMEM if the full
size of the cache cannot be allocated during topup, the common
implementation is happy with any allocation between min and max.  There
should be no architecture-specific requirement for doing it one way or
the other and it's in fact better to enforce a cross-architecture KVM
policy on this behavior.

Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h| 13 +-
 arch/arm/include/asm/kvm_mmu.h |  2 +-
 arch/arm/include/asm/kvm_types.h   |  7 +++
 arch/arm64/include/asm/kvm_host.h  | 13 +-
 arch/arm64/include/asm/kvm_mmu.h   |  2 +-
 arch/arm64/include/asm/kvm_types.h |  7 +++
 virt/kvm/arm/arm.c |  2 +-
 virt/kvm/arm/mmu.c | 68 --
 8 files changed, 36 insertions(+), 78 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ca56537b61bc..bf6b6d027ff0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -83,17 +83,6 @@ struct kvm_arch {
u32 psci_version;
 };
 
-#define KVM_NR_MEM_OBJS 40
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 struct kvm_vcpu_fault_info {
u32 hsr;/* Hyp Syndrome Register */
u32 hxfar;  /* Hyp Data/Inst. Fault Address Register */
@@ -184,7 +173,7 @@ struct kvm_vcpu_arch {
struct kvm_decode mmio_decode;
 
/* Cache some mmu pages needed inside spinlock regions */
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   struct kvm_mmu_memcache mmu_page_cache;
 
/* Detect first run of a vcpu */
bool has_run_once;
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 3a875fc1b63c..8877f53997c8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -71,7 +71,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
guest_ipa,
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
diff --git a/arch/arm/include/asm/kvm_types.h b/arch/arm/include/asm/kvm_types.h
index bc389f82e88d..51c9b0cb9718 100644
--- a/arch/arm/include/asm/kvm_types.h
+++ b/arch/arm/include/asm/kvm_types.h
@@ -2,4 +2,11 @@
 #ifndef _ASM_ARM_KVM_TYPES_H
 #define _ASM_ARM_KVM_TYPES_H
 
+#define KVM_ARCH_WANT_MMU_MEMCACHE
+
+#define KVM_MMU_NR_MEMCACHE_OBJS 40
+
+#define KVM_MMU_CACHE_GFP  GFP_KERNEL
+#define KVM_MMU_CACHE_PAGE_GFP (GFP_KERNEL | __GFP_ZERO)
+
 #endif /* _ASM_ARM_KVM_TYPES_H */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 7732d0ba4e60..1aa951de8338 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -82,17 +82,6 @@ struct kvm_arch {
u32 psci_version;
 };
 
-#define KVM_NR_MEM_OBJS 40
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 struct kvm_vcpu_fault_info {
u32 esr_el2;/* Hyp Syndrom Register */
u64 far_el2;/* Hyp Fault Address Register */
@@ -285,7 +274,7 @@ struct kvm_vcpu_arch {
struct kvm_decode mmio_decode;
 
/* Cache some mmu pages needed inside spinlock regions */
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   struct kvm_mmu_memcache mmu_page_cache;
 
/* Target CPU and feature flags */
int target;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 8af4b1befa42..dec55fa00e56 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -170,7 +170,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
guest_ipa,
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
diff --git a/arch/arm64/include/asm/kvm_types.h 
b/arch/arm64/include/asm/kvm_types.h
index d0987007d581..706f6cb6f9f3 100644
--- a/arch/arm64/include/asm/kvm_types.h
+++ b/arch/arm64/include/asm/kvm_types.h
@@ -2,5 +2,12 @@
 #ifndef _ASM_ARM64_KVM_TYPES_H
 #define _ASM_AR

[PATCH v2 1/4] KVM: x86: Move mmu_memory_cache functions to common code

2019-01-31 Thread Christoffer Dall
We are currently duplicating the mmu memory cache functionality quite
heavily between the architectures that support KVM.  As a first step,
move the x86 implementation (which seems to have the most recently
maintained version of the mmu memory cache) to common code.

We introduce an arch-specific kvm_types.h which can be used to
define the architecture-specific GFP flags for allocating memory to the
memory cache, and to specify how many objects are required in the memory
cache.  These are the two points where the current implementations
diverge across architectures.  Since kvm_host.h defines structures with
fields of the memcache object, we define the memcache structure in
kvm_types.h, and we include the architecture-specific kvm_types.h to
know the size of object in kvm_host.h.

We only define the functions and data types if
KVM_ARCH_WANT_MMU_MEMORY_CACHE is defined, because not all architectures
require the mmu memory cache.

Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_types.h |  5 +++
 arch/arm64/include/asm/kvm_types.h   |  6 +++
 arch/mips/include/asm/kvm_types.h|  5 +++
 arch/powerpc/include/asm/kvm_types.h |  5 +++
 arch/s390/include/asm/kvm_types.h|  5 +++
 arch/x86/include/asm/kvm_host.h  | 11 -
 arch/x86/include/asm/kvm_types.h | 12 ++
 arch/x86/kvm/mmu.c   | 59 ---
 include/linux/kvm_host.h | 11 +
 include/linux/kvm_types.h| 13 ++
 virt/kvm/kvm_main.c  | 60 
 11 files changed, 122 insertions(+), 70 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_types.h
 create mode 100644 arch/arm64/include/asm/kvm_types.h
 create mode 100644 arch/mips/include/asm/kvm_types.h
 create mode 100644 arch/powerpc/include/asm/kvm_types.h
 create mode 100644 arch/s390/include/asm/kvm_types.h
 create mode 100644 arch/x86/include/asm/kvm_types.h

diff --git a/arch/arm/include/asm/kvm_types.h b/arch/arm/include/asm/kvm_types.h
new file mode 100644
index ..bc389f82e88d
--- /dev/null
+++ b/arch/arm/include/asm/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM_KVM_TYPES_H
+#define _ASM_ARM_KVM_TYPES_H
+
+#endif /* _ASM_ARM_KVM_TYPES_H */
diff --git a/arch/arm64/include/asm/kvm_types.h 
b/arch/arm64/include/asm/kvm_types.h
new file mode 100644
index ..d0987007d581
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_types.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_KVM_TYPES_H
+#define _ASM_ARM64_KVM_TYPES_H
+
+#endif /* _ASM_ARM64_KVM_TYPES_H */
+
diff --git a/arch/mips/include/asm/kvm_types.h 
b/arch/mips/include/asm/kvm_types.h
new file mode 100644
index ..5efeb32a5926
--- /dev/null
+++ b/arch/mips/include/asm/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_MIPS_KVM_TYPES_H
+#define _ASM_MIPS_KVM_TYPES_H
+
+#endif /* _ASM_MIPS_KVM_TYPES_H */
diff --git a/arch/powerpc/include/asm/kvm_types.h 
b/arch/powerpc/include/asm/kvm_types.h
new file mode 100644
index ..f627eceaa314
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_KVM_TYPES_H
+#define _ASM_POWERPC_KVM_TYPES_H
+
+#endif /* _ASM_POWERPC_KVM_TYPES_H */
diff --git a/arch/s390/include/asm/kvm_types.h 
b/arch/s390/include/asm/kvm_types.h
new file mode 100644
index ..b66a81f8a354
--- /dev/null
+++ b/arch/s390/include/asm/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_S390_KVM_TYPES_H
+#define _ASM_S390_KVM_TYPES_H
+
+#endif /* _ASM_S390_KVM_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4660ce90de7f..9f97dcd15097 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -179,8 +179,6 @@ enum {
 
 #include 
 
-#define KVM_NR_MEM_OBJS 40
-
 #define KVM_NR_DB_REGS 4
 
 #define DR6_BD (1 << 13)
@@ -238,15 +236,6 @@ enum {
 
 struct kvm_kernel_irq_routing_entry;
 
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 /*
  * the pages used as guest page table on soft mmu are tracked by
  * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_types.h
new file mode 100644
index ..5260d751940e
--- /dev/null
+++ b/arch/x86/include/asm/kvm_types.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_KVM_TYPES_H
+#define _ASM_X86_KVM_TYPES_H
+
+#define KVM_ARCH_WANT_MMU_MEMORY_CACHE
+
+#define KVM_NR_MEM_OBJS 40
+
+#define KVM_MMU_CACHE_GFP  GFP_KERNEL
+#define KVM_MMU_CACHE_PAGE_GFP GFP_KERNEL_ACCOUNT
+
+#endif /* _ASM_X86_KVM_TYPES_H */
d

Re: [PATCH v2 1/4] KVM: arm64: Forbid kprobing of the VHE world-switch code

2019-01-31 Thread Christoffer Dall
On Thu, Jan 24, 2019 at 04:32:54PM +, James Morse wrote:
> On systems with VHE the kernel and KVM's world-switch code run at the
> same exception level. Code that is only used on a VHE system does not
> need to be annotated as __hyp_text as it can reside anywhere in the
> kernel text.
> 
> __hyp_text was also used to prevent kprobes from patching breakpoint
> instructions into this region, as this code runs at a different
> exception level. While this is no longer true with VHE, KVM still
> switches VBAR_EL1, meaning a kprobe's breakpoint executed in the
> world-switch code will cause a hyp-panic.

Forgive potentially very stupid questions here, but:

 (1) Would it make sense to move the save/restore VBAR_EL1 to the last
 possible moment, and would that actually allow kprobes to work for
 the world-switch code, or does that just result in other weird
 problems?

 (2) Are we sure that this catches every call path of every non-inlined
 function called after switchign VBAR_EL1?  Can kprobes only be
 called on exported symbols, or can you (if you know the address
 somehow) put a kprobe on a static function as well.  If there are
 any concerns in this area, we might want to consider (1) more
 closely.


Thanks,

Christoffer

> 
> echo "p:weasel sysreg_save_guest_state_vhe" > 
> /sys/kernel/debug/tracing/kprobe_events
> echo 1 > /sys/kernel/debug/tracing/events/kprobes/weasel/enable
> lkvm run -k /boot/Image --console serial -p "console=ttyS0 
> earlycon=uart,mmio,0x3f8"
> 
>   # lkvm run -k /boot/Image -m 384 -c 3 --name guest-1474
>   Info: Placing fdt at 0x8fe0 - 0x8fff
>   Info: virtio-mmio.devices=0x200@0x1:36
> 
>   Info: virtio-mmio.devices=0x200@0x10200:37
> 
>   Info: virtio-mmio.devices=0x200@0x10400:38
> 
> [  614.178186] Kernel panic - not syncing: HYP panic:
> [  614.178186] PS:404003c9 PC:100d70e0 ESR:f204
> [  614.178186] FAR:8008 HPFAR:00800800 
> PAR:1d7edbadc0de
> [  614.178186] VCPU:f8de32f1
> [  614.178383] CPU: 2 PID: 1482 Comm: kvm-vcpu-0 Not tainted 5.0.0-rc2 #10799
> [  614.178446] Call trace:
> [  614.178480]  dump_backtrace+0x0/0x148
> [  614.178567]  show_stack+0x24/0x30
> [  614.178658]  dump_stack+0x90/0xb4
> [  614.178710]  panic+0x13c/0x2d8
> [  614.178793]  hyp_panic+0xac/0xd8
> [  614.178880]  kvm_vcpu_run_vhe+0x9c/0xe0
> [  614.178958]  kvm_arch_vcpu_ioctl_run+0x454/0x798
> [  614.179038]  kvm_vcpu_ioctl+0x360/0x898
> [  614.179087]  do_vfs_ioctl+0xc4/0x858
> [  614.179174]  ksys_ioctl+0x84/0xb8
> [  614.179261]  __arm64_sys_ioctl+0x28/0x38
> [  614.179348]  el0_svc_common+0x94/0x108
> [  614.179401]  el0_svc_handler+0x38/0x78
> [  614.179487]  el0_svc+0x8/0xc
> [  614.179558] SMP: stopping secondary CPUs
> [  614.179661] Kernel Offset: disabled
> [  614.179695] CPU features: 0x003,2a80aa38
> [  614.179758] Memory Limit: none
> [  614.179858] ---[ end Kernel panic - not syncing: HYP panic:
> [  614.179858] PS:404003c9 PC:100d70e0 ESR:f204
> [  614.179858] FAR:8008 HPFAR:00800800 
> PAR:1d7edbadc0de
> [  614.179858] VCPU:f8de32f1 ]---
> 
> Annotate the VHE world-switch functions that aren't marked
> __hyp_text using NOKPROBE_SYMBOL().
> 
> Signed-off-by: James Morse 
> Fixes: 3f5c90b890ac ("KVM: arm64: Introduce VHE-specific kvm_vcpu_run")
> ---
> 
> This has been an issue since the VHE/non-VHE world-switch paths were
> split.
> 
> 
> Changes since v1:
>  * Switched to NOKPROBE_SYMBOL() as this doesn't move code between
>sections.
> 
> ---
>  arch/arm64/kvm/hyp/switch.c| 5 +
>  arch/arm64/kvm/hyp/sysreg-sr.c | 5 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index b0b1478094b4..421ebf6f7086 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -23,6 +23,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -107,6 +108,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>  
>   write_sysreg(kvm_get_hyp_vector(), vbar_el1);
>  }
> +NOKPROBE_SYMBOL(activate_traps_vhe);
>  
>  static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>  {
> @@ -154,6 +156,7 @@ static void deactivate_traps_vhe(void)
>   write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
>   write_sysreg(vectors, vbar_el1);
>  }
> +NOKPROBE_SYMBOL(deactivate_traps_vhe);
>  
>  static void __hyp_text __deactivate_traps_nvhe(void)
>  {
> @@ -513,6 +516,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  
>   return exit_code;
>  }
> +NOKPROBE_SYMBOL(kvm_vcpu_run_vhe);
>  
>  /* Switch to the guest for legacy non-VHE systems */
>  int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> @@ -620,6 +624,7 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 
> par,
> read_sysreg_el2(esr),   read_sysreg_el2(far),
> read_sysreg(hpfar_el2), par, vcpu);
>  }
> +NOKPROBE_SYMBOL(__hy

Re: [PATCH v2 2/4] arm64: kprobe: Always blacklist the KVM world-switch code

2019-01-31 Thread Christoffer Dall
On Thu, Jan 24, 2019 at 04:32:55PM +, James Morse wrote:
> On systems with VHE the kernel and KVM's world-switch code run at the
> same exception level. Code that is only used on a VHE system does not
> need to be annotated as __hyp_text as it can reside anywhere in the
>  kernel text.
> 
> __hyp_text was also used to prevent kprobes from patching breakpoint
> instructions into this region, as this code runs at a different
> exception level. While this is no longer true with VHE, KVM still
> switches VBAR_EL1, meaning a kprobe's breakpoint executed in the
> world-switch code will cause a hyp-panic.
> 
> Move the __hyp_text check in the kprobes blacklist so it applies on
> VHE systems too, to cover the common code and guest enter/exit
> assembly.
> 
> Fixes: 888b3c8720e0 ("arm64: Treat all entry code as non-kprobe-able")
> Signed-off-by: James Morse 
> Acked-by: Masami Hiramatsu 

Reviewed-by: Christoffer Dall 

> ---
>  arch/arm64/kernel/probes/kprobes.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/probes/kprobes.c 
> b/arch/arm64/kernel/probes/kprobes.c
> index 2a5b338b2542..f17afb99890c 100644
> --- a/arch/arm64/kernel/probes/kprobes.c
> +++ b/arch/arm64/kernel/probes/kprobes.c
> @@ -478,13 +478,13 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
>   addr < (unsigned long)__entry_text_end) ||
>   (addr >= (unsigned long)__idmap_text_start &&
>   addr < (unsigned long)__idmap_text_end) ||
> + (addr >= (unsigned long)__hyp_text_start &&
> + addr < (unsigned long)__hyp_text_end) ||
>   !!search_exception_tables(addr))
>   return true;
>  
>   if (!is_kernel_in_hyp_mode()) {
> - if ((addr >= (unsigned long)__hyp_text_start &&
> - addr < (unsigned long)__hyp_text_end) ||
> - (addr >= (unsigned long)__hyp_idmap_text_start &&
> + if ((addr >= (unsigned long)__hyp_idmap_text_start &&
>   addr < (unsigned long)__hyp_idmap_text_end))
>   return true;
>   }
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/4] arm64: hyp-stub: Forbid kprobing of the hyp-stub

2019-01-31 Thread Christoffer Dall
On Thu, Jan 24, 2019 at 04:32:56PM +, James Morse wrote:
> The hyp-stub is loaded by the kernel's early startup code at EL2
> during boot, before KVM takes ownership later. The hyp-stub's
> text is part of the regular kernel text, meaning it can be kprobed.
> 
> A breakpoint in the hyp-stub causes the CPU to spin in el2_sync_invalid.
> 
> Add it to the __hyp_text.
> 
> Signed-off-by: James Morse 
> Cc: sta...@vger.kernel.org
> ---
> 
> This has been a problem since kprobes was merged, it should
> probably have been covered in 888b3c8720e0.
> 
> I'm not sure __hyp_text is the right place. Its not idmaped,
> and as it contains a set of vectors, adding it to the host/hyp
> idmap sections could grow them beyond a page... but it does
> run with the MMU off, so does need to be cleaned to PoC when
> anything wacky, like hibernate happens. With this patch,
> hibernate should clean the __hyp_text to PoC too.

How did this code get cleaned before?

Is there a problem you can identify with putting it in __hyp_text?
Seems to me we should just stick it there if it has no negative
side-effects and otherwise we have to make up a separate section with a
specialized meaning.


Thanks,

Christoffer

> ---
>  arch/arm64/kernel/hyp-stub.S | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
> index e1261fbaa374..17f325ba831e 100644
> --- a/arch/arm64/kernel/hyp-stub.S
> +++ b/arch/arm64/kernel/hyp-stub.S
> @@ -28,6 +28,8 @@
>  #include 
>  
>   .text
> + .pushsection.hyp.text, "ax"
> +
>   .align 11
>  
>  ENTRY(__hyp_stub_vectors)
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/5] arm/arm64: KVM: Allow a VCPU to fully reset itself

2019-01-30 Thread Christoffer Dall
On Wed, Jan 30, 2019 at 04:27:21PM +0100, Andrew Jones wrote:
> On Wed, Jan 30, 2019 at 10:34:31AM +0100, Christoffer Dall wrote:
> > On Tue, Jan 29, 2019 at 05:03:47PM +0100, Andrew Jones wrote:
> > > On Fri, Jan 25, 2019 at 10:46:53AM +0100, Christoffer Dall wrote:
> > > > From: Marc Zyngier 
> > > > 
> > > > The current kvm_psci_vcpu_on implementation will directly try to
> > > > manipulate the state of the VCPU to reset it.  However, since this is
> > > > not done on the thread that runs the VCPU, we can end up in a strangely
> > > > corrupted state when the source and target VCPUs are running at the same
> > > > time.
> > > > 
> > > > Fix this by factoring out all reset logic from the PSCI implementation
> > > > and forwarding the required information along with a request to the
> > > > target VCPU.
> > > 
> > > The last patch makes more sense, now that I see this one. I guess their
> > > order should be swapped.
> > > 
> > > > 
> > > > Signed-off-by: Marc Zyngier 
> > > > Signed-off-by: Christoffer Dall 
> > > > ---
> > > >  arch/arm/include/asm/kvm_host.h   | 10 +
> > > >  arch/arm/kvm/reset.c  | 24 +
> > > >  arch/arm64/include/asm/kvm_host.h | 11 ++
> > > >  arch/arm64/kvm/reset.c| 24 +
> > > >  virt/kvm/arm/arm.c| 10 +
> > > >  virt/kvm/arm/psci.c   | 36 ++-
> > > >  6 files changed, 95 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/include/asm/kvm_host.h 
> > > > b/arch/arm/include/asm/kvm_host.h
> > > > index ca56537b61bc..50e89869178a 100644
> > > > --- a/arch/arm/include/asm/kvm_host.h
> > > > +++ b/arch/arm/include/asm/kvm_host.h
> > > > @@ -48,6 +48,7 @@
> > > >  #define KVM_REQ_SLEEP \
> > > > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > > >  #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
> > > > +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
> > > >  
> > > >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> > > >  
> > > > @@ -147,6 +148,13 @@ struct kvm_cpu_context {
> > > >  
> > > >  typedef struct kvm_cpu_context kvm_cpu_context_t;
> > > >  
> > > > +struct vcpu_reset_state {
> > > > +   unsigned long   pc;
> > > > +   unsigned long   r0;
> > > > +   boolbe;
> > > > +   boolreset;
> > > > +};
> > > > +
> > > >  struct kvm_vcpu_arch {
> > > > struct kvm_cpu_context ctxt;
> > > >  
> > > > @@ -186,6 +194,8 @@ struct kvm_vcpu_arch {
> > > > /* Cache some mmu pages needed inside spinlock regions */
> > > > struct kvm_mmu_memory_cache mmu_page_cache;
> > > >  
> > > > +   struct vcpu_reset_state reset_state;
> > > > +
> > > > /* Detect first run of a vcpu */
> > > > bool has_run_once;
> > > >  };
> > > > diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
> > > > index 5ed0c3ee33d6..de41255eebcd 100644
> > > > --- a/arch/arm/kvm/reset.c
> > > > +++ b/arch/arm/kvm/reset.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  
> > > >  #include 
> > > >  
> > > > @@ -69,6 +70,29 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> > > > /* Reset CP15 registers */
> > > > kvm_reset_coprocs(vcpu);
> > > >  
> > > > +   /*
> > > > +* Additional reset state handling that PSCI may have imposed 
> > > > on us.
> > > > +* Must be done after all the sys_reg reset.
> > > > +*/
> > > > +   if (vcpu->arch.reset_state.reset) {
> > > > +   unsigned long target_pc = vcpu->arch.reset_state.pc;
> > > > +
> > > > +   /* Gracefully handle Thumb2 entry point */
> > > > +   if (target_pc & 1) {
> > > > +   target_pc &= ~1UL;
> > > > +   vcpu_set_thumb(vcpu);

Re: [PATCH v9 10/26] arm64: kvm: Unmask PMR before entering guest

2019-01-30 Thread Christoffer Dall
On Mon, Jan 21, 2019 at 03:33:29PM +, Julien Thierry wrote:
> Interrupts masked by ICC_PMR_EL1 will not be signaled to the CPU. This
> means that hypervisor will not receive masked interrupts while running a
> guest.
> 

You could add to the commit description how this works overall,
something along the lines of:

We need to make sure that all maskable interrupts are masked from the
time we call local_irq_disable() in the main run loop, and remain so
until we call local_irq_enable() after returning from the guest, and we
need to ensure that we see no interrupts at all (including pseudo-NMIs)
in the middle of the VM world-switch, while at the same time we need to
ensure we exit the guest when there are interrupts for the host.

We can accomplish this with pseudo-NMIs enabled by:
  (1) local_irq_disable: set the priority mask
  (2) enter guest: set PSTATE.I
  (3)  clear the priority mask
  (4) eret to guest
  (5) exit guest:  set the priotiy mask
   clear PSTATE.I (and restore other host PSTATE bits)
  (6) local_irq_enable: clear the priority mask.

Also, took me a while to realize that when we come back from the guest,
we call local_daif_restore with DAIF_PROCCTX_NOIRQ, which actually does
both of the things in (5).

> Avoid this by making sure ICC_PMR_EL1 is unmasked when we enter a guest.
> 
> Signed-off-by: Julien Thierry 
> Acked-by: Catalin Marinas 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: kvmarm@lists.cs.columbia.edu
> ---
>  arch/arm64/include/asm/kvm_host.h | 12 
>  arch/arm64/kvm/hyp/switch.c   | 16 
>  2 files changed, 28 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 7732d0b..a1f9f55 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -24,6 +24,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -474,6 +475,17 @@ static inline int kvm_arch_vcpu_run_pid_change(struct 
> kvm_vcpu *vcpu)
>  static inline void kvm_arm_vhe_guest_enter(void)
>  {
>   local_daif_mask();
> +
> + /*
> +  * Having IRQs masked via PMR when entering the guest means the GIC
> +  * will not signal the CPU of interrupts of lower priority, and the
> +  * only way to get out will be via guest exceptions.
> +  * Naturally, we want to avoid this.
> +  */
> + if (system_uses_irq_prio_masking()) {
> + gic_write_pmr(GIC_PRIO_IRQON);
> + dsb(sy);
> + }
>  }
>  
>  static inline void kvm_arm_vhe_guest_exit(void)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index b0b1478..6a4c2d6 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -22,6 +22,7 @@
>  
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -521,6 +522,17 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>   struct kvm_cpu_context *guest_ctxt;
>   u64 exit_code;
>  
> + /*
> +  * Having IRQs masked via PMR when entering the guest means the GIC
> +  * will not signal the CPU of interrupts of lower priority, and the
> +  * only way to get out will be via guest exceptions.
> +  * Naturally, we want to avoid this.
> +  */
> + if (system_uses_irq_prio_masking()) {
> + gic_write_pmr(GIC_PRIO_IRQON);
> + dsb(sy);
> + }
> +
>   vcpu = kern_hyp_va(vcpu);
>  
>   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> @@ -573,6 +585,10 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>*/
>   __debug_switch_to_host(vcpu);
>  
> + /* Returning to host will clear PSR.I, remask PMR if needed */
> + if (system_uses_irq_prio_masking())
> + gic_write_pmr(GIC_PRIO_IRQOFF);
> +
>   return exit_code;
>  }
>  

nit: you could consider moving the non-vhe part into a new
kvm_arm_nvhe_guest_enter, for symmetry with the vhe part.

Otherwise looks good to me:

Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 5/5] arm/arm64: KVM: Don't panic on failure to properly reset system registers

2019-01-30 Thread Christoffer Dall
On Tue, Jan 29, 2019 at 05:33:37PM +0100, Andrew Jones wrote:
> On Fri, Jan 25, 2019 at 10:46:56AM +0100, Christoffer Dall wrote:
> > From: Marc Zyngier 
> > 
> > Failing to properly reset system registers is pretty bad. But not
> > quite as bad as bringing the whole machine down... So warn loudly,
> > but slightly more gracefully.
> > 
> > Signed-off-by: Marc Zyngier 
> > Acked-by: Christoffer Dall 
> > ---
> >  arch/arm/kvm/coproc.c | 4 ++--
> >  arch/arm64/kvm/sys_regs.c | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> > index 222c1635bc7a..e8bd288fd5be 100644
> > --- a/arch/arm/kvm/coproc.c
> > +++ b/arch/arm/kvm/coproc.c
> > @@ -1450,6 +1450,6 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
> > reset_coproc_regs(vcpu, table, num);
> >  
> > for (num = 1; num < NR_CP15_REGS; num++)
> > -   if (vcpu_cp15(vcpu, num) == 0x42424242)
> > -   panic("Didn't reset vcpu_cp15(vcpu, %zi)", num);
> > +   WARN(vcpu_cp15(vcpu, num) == 0x42424242,
> > +"Didn't reset vcpu_cp15(vcpu, %zi)", num);
> >  }
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 86096774abcd..4f067545c7d2 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -2609,6 +2609,6 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
> > reset_sys_reg_descs(vcpu, table, num);
> >  
> > for (num = 1; num < NR_SYS_REGS; num++)
> > -   if (__vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
> > -   panic("Didn't reset __vcpu_sys_reg(%zi)", num);
> > +   WARN(__vcpu_sys_reg(vcpu, num) == 0x4242424242424242,
> > +"Didn't reset __vcpu_sys_reg(%zi)\n", num);
> >  }
> > -- 
> > 2.18.0
> >
> 
> If we only get halfway through resetting, then we'll get a warn splat,
> complete with a backtrace, for each register. Should we do something
> like the following instead?
> 
>   for (num = 1; num < NR_SYS_REGS; num++)
>  if (__vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
> failed++;
>   WARN(failed, "Didn't reset %d system registers", failed);
> 

I don't care stongly whichever way we do it, but when we actually saw
this, it seemed useful to see which system register was not initialized.

Thanks for the review.

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/5] KVM: arm/arm64: Require VCPU threads to turn them self off

2019-01-30 Thread Christoffer Dall
On Tue, Jan 29, 2019 at 05:16:44PM +0100, Andrew Jones wrote:
> 
> Summary edit
> 
> > KVM: arm/arm64: Require VCPU threads to turn them self off
> 
> themselves
> 
> On Fri, Jan 25, 2019 at 10:46:54AM +0100, Christoffer Dall wrote:
> > To avoid a race between turning VCPUs off and turning them on, make sure
> > that only the VCPU threat itself turns off the VCPU.  When other threads
> > want to turn of a VCPU, they now do this via a request.
> > 
> > Signed-off-by: Christoffer Dall 
> > Acked-by: Marc Zyngier 
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  2 ++
> >  arch/arm64/include/asm/kvm_host.h |  2 ++
> >  virt/kvm/arm/arm.c|  8 ++--
> >  virt/kvm/arm/psci.c   | 11 ++-
> >  4 files changed, 12 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index 50e89869178a..b1cfae222441 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -49,6 +49,8 @@
> > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >  #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
> >  #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
> > +#define KVM_REQ_VCPU_OFF \
> > +   KVM_ARCH_REQ_FLAGS(3, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >  
> >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> >  
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index da3fc7324d68..d43b13421987 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -49,6 +49,8 @@
> > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >  #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
> >  #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
> > +#define KVM_REQ_VCPU_OFF \
> > +   KVM_ARCH_REQ_FLAGS(3, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >  
> >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> >  
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 9c486fad3f9f..785076176814 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -404,8 +404,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  
> >  static void vcpu_power_off(struct kvm_vcpu *vcpu)
> >  {
> > -   vcpu->arch.power_off = true;
> > -   kvm_make_request(KVM_REQ_SLEEP, vcpu);
> > +   kvm_make_request(KVM_REQ_VCPU_OFF, vcpu);
> > kvm_vcpu_kick(vcpu);
> >  }
> 
> I think we should leave this function alone. Otherwise if userspace sets
> the MP state to STOPPED and then queries the state before the vcpu
> has a chance to manage its vcpu requests, the state will still indicate
> RUNNBLE. The same goes for a query right after doing a vcpu init.
> 

We can't leave this alone, because that could lead to userspace racing
with two PSCI_VCPU_ON requests which could then both enter the critical
section gated only by the cmpxchg in kvm_psci_vcpu_on.

But we could do something like this (completely untested):


diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 1e3195155860..538b5eb9d920 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -404,6 +404,17 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 static void vcpu_power_off(struct kvm_vcpu *vcpu)
 {
+   enum vcpu_power_state old_power_state;
+
+   /*
+* Set power_state directly to reflect the power state back to user
+* space even when the VCPU thread has not had a chance to run, but
+* only if this doesn't accidentally allow interleaved PSCI_VCPU_ON
+* requests.
+*/
+   old_power_state = cmpxchg(&vcpu->arch.power_state,
+ KVM_ARM_VCPU_ON,
+ KVM_ARM_VCPU_OFF);
kvm_make_request(KVM_REQ_VCPU_OFF, vcpu);
kvm_vcpu_kick(vcpu);
 }


> >  
> > @@ -646,6 +645,11 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
> > if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
> > vcpu_req_sleep(vcpu);
> >  
> > +   if (kvm_check_request(KVM_REQ_VCPU_OFF, vcpu)) {
> > +   vcpu->arch.power_off = true;
> > +   vcpu_req_sleep(vcpu);
> > +   }
> > +
> > if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
> > kvm_reset_vcpu(vcpu);
> >  
> > diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
> > index b9cff1d4b06d..20255319e193 100644
> > --- a/virt/kvm/arm/psci.c
> > +++ b/virt/kvm/arm/psci.c
> > 

Re: [PATCH 2/5] arm/arm64: KVM: Allow a VCPU to fully reset itself

2019-01-30 Thread Christoffer Dall
On Tue, Jan 29, 2019 at 05:03:47PM +0100, Andrew Jones wrote:
> On Fri, Jan 25, 2019 at 10:46:53AM +0100, Christoffer Dall wrote:
> > From: Marc Zyngier 
> > 
> > The current kvm_psci_vcpu_on implementation will directly try to
> > manipulate the state of the VCPU to reset it.  However, since this is
> > not done on the thread that runs the VCPU, we can end up in a strangely
> > corrupted state when the source and target VCPUs are running at the same
> > time.
> > 
> > Fix this by factoring out all reset logic from the PSCI implementation
> > and forwarding the required information along with a request to the
> > target VCPU.
> 
> The last patch makes more sense, now that I see this one. I guess their
> order should be swapped.
> 
> > 
> > Signed-off-by: Marc Zyngier 
> > Signed-off-by: Christoffer Dall 
> > ---
> >  arch/arm/include/asm/kvm_host.h   | 10 +
> >  arch/arm/kvm/reset.c  | 24 +
> >  arch/arm64/include/asm/kvm_host.h | 11 ++
> >  arch/arm64/kvm/reset.c| 24 +
> >  virt/kvm/arm/arm.c| 10 +
> >  virt/kvm/arm/psci.c   | 36 ++-
> >  6 files changed, 95 insertions(+), 20 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index ca56537b61bc..50e89869178a 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -48,6 +48,7 @@
> >  #define KVM_REQ_SLEEP \
> > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >  #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
> > +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
> >  
> >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> >  
> > @@ -147,6 +148,13 @@ struct kvm_cpu_context {
> >  
> >  typedef struct kvm_cpu_context kvm_cpu_context_t;
> >  
> > +struct vcpu_reset_state {
> > +   unsigned long   pc;
> > +   unsigned long   r0;
> > +   boolbe;
> > +   boolreset;
> > +};
> > +
> >  struct kvm_vcpu_arch {
> > struct kvm_cpu_context ctxt;
> >  
> > @@ -186,6 +194,8 @@ struct kvm_vcpu_arch {
> > /* Cache some mmu pages needed inside spinlock regions */
> > struct kvm_mmu_memory_cache mmu_page_cache;
> >  
> > +   struct vcpu_reset_state reset_state;
> > +
> > /* Detect first run of a vcpu */
> > bool has_run_once;
> >  };
> > diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
> > index 5ed0c3ee33d6..de41255eebcd 100644
> > --- a/arch/arm/kvm/reset.c
> > +++ b/arch/arm/kvm/reset.c
> > @@ -26,6 +26,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  
> > @@ -69,6 +70,29 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> > /* Reset CP15 registers */
> > kvm_reset_coprocs(vcpu);
> >  
> > +   /*
> > +* Additional reset state handling that PSCI may have imposed on us.
> > +* Must be done after all the sys_reg reset.
> > +*/
> > +   if (vcpu->arch.reset_state.reset) {
> > +   unsigned long target_pc = vcpu->arch.reset_state.pc;
> > +
> > +   /* Gracefully handle Thumb2 entry point */
> > +   if (target_pc & 1) {
> > +   target_pc &= ~1UL;
> > +   vcpu_set_thumb(vcpu);
> > +   }
> > +
> > +   /* Propagate caller endianness */
> > +   if (vcpu->arch.reset_state.be)
> > +   kvm_vcpu_set_be(vcpu);
> > +
> > +   *vcpu_pc(vcpu) = target_pc;
> > +   vcpu_set_reg(vcpu, 0, vcpu->arch.reset_state.r0);
> > +
> > +   vcpu->arch.reset_state.reset = false;
> > +   }
> > +
> > /* Reset arch_timer context */
> > return kvm_timer_vcpu_reset(vcpu);
> >  }
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 7732d0ba4e60..da3fc7324d68 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -48,6 +48,7 @@
> >  #define KVM_REQ_SLEEP \
> > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >  #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
> > +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
> >  
> >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> >  
> >

Re: [PATCH 1/5] KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded

2019-01-30 Thread Christoffer Dall
On Tue, Jan 29, 2019 at 04:05:25PM +, Marc Zyngier wrote:
> Hi Drew,
> 
> On Tue, 29 Jan 2019 15:48:58 +,
> Andrew Jones  wrote:
> > 
> > Hi Christoffer,
> > 
> > On Fri, Jan 25, 2019 at 10:46:52AM +0100, Christoffer Dall wrote:
> > > Resetting the VCPU state modifies the system register state in memory,
> > > but this may interact with vcpu_load/vcpu_put if running with preemption
> > > disabled, which in turn may lead to corrupted system register state.
> >   ^ enabled
> > > 
> > > Address this by disabling preemption and doing put/load if required
> > > around the reset logic.
> > 
> > I'm having trouble understanding how disabling preemption helps here.
> > There shouldn't be an issue with the KVM_ARM_VCPU_INIT case, since the
> > target vcpu is guaranteed not to be loaded and therefore it doesn't
> > have preempt notifiers registered either. Also, KVM_ARM_VCPU_INIT holds
> > the vcpu mutex, so there's no chance for a load to occur until it's
> > complete.
> > 
> > For the PSCI case it makes sense to force a vcpu load after the reset,
> > otherwise the sleeping target vcpu won't have the correct state loaded.
> > The initial put also makes sense in case we're not resetting everything.
> > I don't understand how we're ensuring the target vcpu thread's preemption
> > is disabled though. This modified kvm_reset_vcpu would need to be run
> > from the target vcpu thread to work, but that's not how the PSCI code
> > currently does it.
> 
> And that's exactly where we're going with the following patch in the
> series. Ultimately, we need a vcpu to reset itself, as we otherwise
> have a window where a vcpu can be spuriously loaded whilst being
> reset.
> 
FWIW, I think the confusion here comes from having re-ordered the
patches compared to how the commit text was originally written.  We
should probably explain in the commit message that this is in
preparation for doing the reset from the VCPU itself.

Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] KVM: x86: Move mmu_memory_cache functions to common code

2019-01-29 Thread Christoffer Dall
On Mon, Jan 28, 2019 at 08:19:56AM -0800, Sean Christopherson wrote:
> On Mon, Jan 28, 2019 at 03:48:41PM +0100, Christoffer Dall wrote:
> > We are currently duplicating the mmu memory cache functionality quite
> > heavily between the architectures that support KVM.  As a first step,
> > move the x86 implementation (which seems to have the most recently
> > maintained version of the mmu memory cache) to common code.
> > 
> > We rename the functions and data types to have a kvm_ prefix for
> > anything exported as a symbol to the rest of the kernel and take the
> > chance to rename memory_cache to memcache to avoid overly long lines.
> 
> It'd be helpful to do the rename it a separate patch so that the move
> really is a straight move of code.
> 
> > This is a bit tedious on the callsites but ends up looking more
> > palatable.
> > 
> > We also introduce an arch-specific kvm_types.h which can be used to
> > define the architecture-specific GFP flags for allocating memory to the
> > memory cache, and to specify how many objects are required in the memory
> > cache.  These are the two points where the current implementations
> > diverge across architectures.  Since kvm_host.h defines structures with
> > fields of the memcache object, we define the memcache structure in
> > kvm_types.h, and we include the architecture-specific kvm_types.h to
> > know the size of object in kvm_host.h.
> > 
> > This patch currently only defines the structure and requires valid
> > defines in the architecture-specific kvm_types.h when KVM_NR_MEM_OBJS is
> > defined.  As we move each architecture to the common implementation,
> > this condition will eventually go away.
> > 
> > Signed-off-by: Christoffer Dall 
> > ---
> >  arch/arm/include/asm/kvm_types.h |  5 ++
> >  arch/arm64/include/asm/kvm_types.h   |  6 ++
> >  arch/mips/include/asm/kvm_types.h|  5 ++
> >  arch/powerpc/include/asm/kvm_types.h |  5 ++
> >  arch/s390/include/asm/kvm_types.h|  5 ++
> >  arch/x86/include/asm/kvm_host.h  | 17 +
> >  arch/x86/include/asm/kvm_types.h | 10 +++
> >  arch/x86/kvm/mmu.c   | 97 ++--
> >  arch/x86/kvm/paging_tmpl.h   |  4 +-
> >  include/linux/kvm_host.h |  9 +++
> >  include/linux/kvm_types.h| 12 
> >  virt/kvm/kvm_main.c  | 58 +
> >  12 files changed, 139 insertions(+), 94 deletions(-)
> >  create mode 100644 arch/arm/include/asm/kvm_types.h
> >  create mode 100644 arch/arm64/include/asm/kvm_types.h
> >  create mode 100644 arch/mips/include/asm/kvm_types.h
> >  create mode 100644 arch/powerpc/include/asm/kvm_types.h
> >  create mode 100644 arch/s390/include/asm/kvm_types.h
> >  create mode 100644 arch/x86/include/asm/kvm_types.h
> > 
> > diff --git a/arch/arm/include/asm/kvm_types.h 
> > b/arch/arm/include/asm/kvm_types.h
> > new file mode 100644
> > index ..bc389f82e88d
> > --- /dev/null
> > +++ b/arch/arm/include/asm/kvm_types.h
> > @@ -0,0 +1,5 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_ARM_KVM_TYPES_H
> > +#define _ASM_ARM_KVM_TYPES_H
> > +
> > +#endif /* _ASM_ARM_KVM_TYPES_H */
> > diff --git a/arch/arm64/include/asm/kvm_types.h 
> > b/arch/arm64/include/asm/kvm_types.h
> > new file mode 100644
> > index ..d0987007d581
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/kvm_types.h
> > @@ -0,0 +1,6 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_ARM64_KVM_TYPES_H
> > +#define _ASM_ARM64_KVM_TYPES_H
> > +
> > +#endif /* _ASM_ARM64_KVM_TYPES_H */
> > +
> > diff --git a/arch/mips/include/asm/kvm_types.h 
> > b/arch/mips/include/asm/kvm_types.h
> > new file mode 100644
> > index ..5efeb32a5926
> > --- /dev/null
> > +++ b/arch/mips/include/asm/kvm_types.h
> > @@ -0,0 +1,5 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_MIPS_KVM_TYPES_H
> > +#define _ASM_MIPS_KVM_TYPES_H
> > +
> > +#endif /* _ASM_MIPS_KVM_TYPES_H */
> > diff --git a/arch/powerpc/include/asm/kvm_types.h 
> > b/arch/powerpc/include/asm/kvm_types.h
> > new file mode 100644
> > index ..f627eceaa314
> > --- /dev/null
> > +++ b/arch/powerpc/include/asm/kvm_types.h
> > @@ -0,0 +1,5 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_POWERPC_KVM_TYPES_H
> > +#define _ASM_POWERPC_KVM_TYPES_H
> > +
> > +#endif /* _ASM_POWERPC_KVM_TYPES_H */
>

[PATCH 3/3] KVM: mips: Move to common kvm_mmu_memcache infrastructure

2019-01-28 Thread Christoffer Dall
Now that we have a common infrastructure for doing MMU cache
allocations, use this for mips as well.

Signed-off-by: Christoffer Dall 
---
 arch/mips/include/asm/kvm_host.h  | 15 ++---
 arch/mips/include/asm/kvm_types.h |  5 +++
 arch/mips/kvm/mips.c  |  2 +-
 arch/mips/kvm/mmu.c   | 54 ++-
 4 files changed, 18 insertions(+), 58 deletions(-)

diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index d2abd98471e8..e05cabd53a9e 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -293,17 +293,6 @@ struct kvm_mips_tlb {
long tlb_lo[2];
 };
 
-#define KVM_NR_MEM_OBJS 4
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 #define KVM_MIPS_AUX_FPU   0x1
 #define KVM_MIPS_AUX_MSA   0x2
 
@@ -378,7 +367,7 @@ struct kvm_vcpu_arch {
unsigned int last_user_gasid;
 
/* Cache some mmu pages needed inside spinlock regions */
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   struct kvm_mmu_memcache mmu_page_cache;
 
 #ifdef CONFIG_KVM_MIPS_VZ
/* vcpu's vzguestid is different on each host cpu in an smp system */
@@ -915,7 +904,7 @@ void kvm_mips_flush_gva_pt(pgd_t *pgd, enum kvm_mips_flush 
flags);
 bool kvm_mips_flush_gpa_pt(struct kvm *kvm, gfn_t start_gfn, gfn_t end_gfn);
 int kvm_mips_mkclean_gpa_pt(struct kvm *kvm, gfn_t start_gfn, gfn_t end_gfn);
 pgd_t *kvm_pgd_alloc(void);
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu);
 void kvm_trap_emul_invalidate_gva(struct kvm_vcpu *vcpu, unsigned long addr,
  bool user);
 void kvm_trap_emul_gva_lockless_begin(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/include/asm/kvm_types.h 
b/arch/mips/include/asm/kvm_types.h
index 5efeb32a5926..6318e8d91f90 100644
--- a/arch/mips/include/asm/kvm_types.h
+++ b/arch/mips/include/asm/kvm_types.h
@@ -2,4 +2,9 @@
 #ifndef _ASM_MIPS_KVM_TYPES_H
 #define _ASM_MIPS_KVM_TYPES_H
 
+#define KVM_NR_MEM_OBJS 4
+
+#define KVM_MMU_CACHE_GFP  GFP_KERNEL
+#define KVM_MMU_CACHE_PAGE_GFP GFP_KERNEL
+
 #endif /* _ASM_MIPS_KVM_TYPES_H */
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 3734cd58895e..5ba6905247d3 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -425,7 +425,7 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 
kvm_mips_dump_stats(vcpu);
 
-   kvm_mmu_free_memory_caches(vcpu);
+   kvm_mmu_free_memcaches(vcpu);
kfree(vcpu->arch.guest_ebase);
kfree(vcpu->arch.kseg0_commpage);
kfree(vcpu);
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 97e538a8c1be..aed5284d642e 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -25,41 +25,9 @@
 #define KVM_MMU_CACHE_MIN_PAGES 2
 #endif
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- int min, int max)
+void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu)
 {
-   void *page;
-
-   BUG_ON(max > KVM_NR_MEM_OBJS);
-   if (cache->nobjs >= min)
-   return 0;
-   while (cache->nobjs < max) {
-   page = (void *)__get_free_page(GFP_KERNEL);
-   if (!page)
-   return -ENOMEM;
-   cache->objects[cache->nobjs++] = page;
-   }
-   return 0;
-}
-
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs)
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
-}
-
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
-   void *p;
-
-   BUG_ON(!mc || !mc->nobjs);
-   p = mc->objects[--mc->nobjs];
-   return p;
-}
-
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
-{
-   mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+   kvm_mmu_free_memcache_page(&vcpu->arch.mmu_page_cache);
 }
 
 /**
@@ -133,7 +101,7 @@ pgd_t *kvm_pgd_alloc(void)
  * NULL if a page table doesn't exist for @addr and !@cache.
  * NULL if a page table allocation failed.
  */
-static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct kvm_mmu_memory_cache *cache,
+static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct kvm_mmu_memcache *cache,
unsigned long addr)
 {
pud_t *pud;
@@ -151,7 +119,7 @@ static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct 
kvm_mmu_memory_cache *cache,
 
if (!cache)
return NULL;
-   new_pmd = mmu_memory_cache_alloc(cache);
+   new_pmd = kvm_mmu_memcache_alloc(cache);
pmd_init((unsigned long)new_pmd,
 (unsigne

[PATCH 2/3] KVM: arm/arm64: Move to common kvm_mmu_memcache infrastructure

2019-01-28 Thread Christoffer Dall
Now when we have a common mmu mmemcache implementation, we can reuse
this for arm and arm64.

The common implementation has a slightly different behavior when
allocating objects under high memory pressure; whereas the current
arm/arm64 implementation will give up and return -ENOMEM if the full
size of the cache cannot be allocated during topup, the common
implementation is happy with any allocation between min and max.  There
should be no architecture-specific requirement for doing it one way or
the other and it's in fact better to enforce a cross-architecture KVM
policy on this behavior.

Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h| 13 +-
 arch/arm/include/asm/kvm_mmu.h |  2 +-
 arch/arm/include/asm/kvm_types.h   |  5 +++
 arch/arm64/include/asm/kvm_host.h  | 13 +-
 arch/arm64/include/asm/kvm_mmu.h   |  2 +-
 arch/arm64/include/asm/kvm_types.h |  5 +++
 virt/kvm/arm/arm.c |  2 +-
 virt/kvm/arm/mmu.c | 68 --
 8 files changed, 32 insertions(+), 78 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ca56537b61bc..bf6b6d027ff0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -83,17 +83,6 @@ struct kvm_arch {
u32 psci_version;
 };
 
-#define KVM_NR_MEM_OBJS 40
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 struct kvm_vcpu_fault_info {
u32 hsr;/* Hyp Syndrome Register */
u32 hxfar;  /* Hyp Data/Inst. Fault Address Register */
@@ -184,7 +173,7 @@ struct kvm_vcpu_arch {
struct kvm_decode mmio_decode;
 
/* Cache some mmu pages needed inside spinlock regions */
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   struct kvm_mmu_memcache mmu_page_cache;
 
/* Detect first run of a vcpu */
bool has_run_once;
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 3a875fc1b63c..8877f53997c8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -71,7 +71,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
guest_ipa,
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
diff --git a/arch/arm/include/asm/kvm_types.h b/arch/arm/include/asm/kvm_types.h
index bc389f82e88d..44d53373fc84 100644
--- a/arch/arm/include/asm/kvm_types.h
+++ b/arch/arm/include/asm/kvm_types.h
@@ -2,4 +2,9 @@
 #ifndef _ASM_ARM_KVM_TYPES_H
 #define _ASM_ARM_KVM_TYPES_H
 
+#define KVM_NR_MEM_OBJS 40
+
+#define KVM_MMU_CACHE_GFP  GFP_KERNEL
+#define KVM_MMU_CACHE_PAGE_GFP (GFP_KERNEL | __GFP_ZERO)
+
 #endif /* _ASM_ARM_KVM_TYPES_H */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 7732d0ba4e60..1aa951de8338 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -82,17 +82,6 @@ struct kvm_arch {
u32 psci_version;
 };
 
-#define KVM_NR_MEM_OBJS 40
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 struct kvm_vcpu_fault_info {
u32 esr_el2;/* Hyp Syndrom Register */
u64 far_el2;/* Hyp Fault Address Register */
@@ -285,7 +274,7 @@ struct kvm_vcpu_arch {
struct kvm_decode mmio_decode;
 
/* Cache some mmu pages needed inside spinlock regions */
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   struct kvm_mmu_memcache mmu_page_cache;
 
/* Target CPU and feature flags */
int target;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 8af4b1befa42..dec55fa00e56 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -170,7 +170,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
guest_ipa,
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+void kvm_mmu_free_memcaches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
diff --git a/arch/arm64/include/asm/kvm_types.h 
b/arch/arm64/include/asm/kvm_types.h
index d0987007d581..2918b4693998 100644
--- a/arch/arm64/include/asm/kvm_types.h
+++ b/arch/arm64/include/asm/kvm_types.h
@@ -2,5 +2,10 @@
 #ifndef _ASM_ARM64_KVM_TYPES_H
 #define _ASM_ARM64_KVM_TYPES_H
 
+#define KVM_NR_MEM_OBJ

[PATCH 1/3] KVM: x86: Move mmu_memory_cache functions to common code

2019-01-28 Thread Christoffer Dall
We are currently duplicating the mmu memory cache functionality quite
heavily between the architectures that support KVM.  As a first step,
move the x86 implementation (which seems to have the most recently
maintained version of the mmu memory cache) to common code.

We rename the functions and data types to have a kvm_ prefix for
anything exported as a symbol to the rest of the kernel and take the
chance to rename memory_cache to memcache to avoid overly long lines.
This is a bit tedious on the callsites but ends up looking more
palatable.

We also introduce an arch-specific kvm_types.h which can be used to
define the architecture-specific GFP flags for allocating memory to the
memory cache, and to specify how many objects are required in the memory
cache.  These are the two points where the current implementations
diverge across architectures.  Since kvm_host.h defines structures with
fields of the memcache object, we define the memcache structure in
kvm_types.h, and we include the architecture-specific kvm_types.h to
know the size of object in kvm_host.h.

This patch currently only defines the structure and requires valid
defines in the architecture-specific kvm_types.h when KVM_NR_MEM_OBJS is
defined.  As we move each architecture to the common implementation,
this condition will eventually go away.

Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_types.h |  5 ++
 arch/arm64/include/asm/kvm_types.h   |  6 ++
 arch/mips/include/asm/kvm_types.h|  5 ++
 arch/powerpc/include/asm/kvm_types.h |  5 ++
 arch/s390/include/asm/kvm_types.h|  5 ++
 arch/x86/include/asm/kvm_host.h  | 17 +
 arch/x86/include/asm/kvm_types.h | 10 +++
 arch/x86/kvm/mmu.c   | 97 ++--
 arch/x86/kvm/paging_tmpl.h   |  4 +-
 include/linux/kvm_host.h |  9 +++
 include/linux/kvm_types.h| 12 
 virt/kvm/kvm_main.c  | 58 +
 12 files changed, 139 insertions(+), 94 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_types.h
 create mode 100644 arch/arm64/include/asm/kvm_types.h
 create mode 100644 arch/mips/include/asm/kvm_types.h
 create mode 100644 arch/powerpc/include/asm/kvm_types.h
 create mode 100644 arch/s390/include/asm/kvm_types.h
 create mode 100644 arch/x86/include/asm/kvm_types.h

diff --git a/arch/arm/include/asm/kvm_types.h b/arch/arm/include/asm/kvm_types.h
new file mode 100644
index ..bc389f82e88d
--- /dev/null
+++ b/arch/arm/include/asm/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM_KVM_TYPES_H
+#define _ASM_ARM_KVM_TYPES_H
+
+#endif /* _ASM_ARM_KVM_TYPES_H */
diff --git a/arch/arm64/include/asm/kvm_types.h 
b/arch/arm64/include/asm/kvm_types.h
new file mode 100644
index ..d0987007d581
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_types.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_KVM_TYPES_H
+#define _ASM_ARM64_KVM_TYPES_H
+
+#endif /* _ASM_ARM64_KVM_TYPES_H */
+
diff --git a/arch/mips/include/asm/kvm_types.h 
b/arch/mips/include/asm/kvm_types.h
new file mode 100644
index ..5efeb32a5926
--- /dev/null
+++ b/arch/mips/include/asm/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_MIPS_KVM_TYPES_H
+#define _ASM_MIPS_KVM_TYPES_H
+
+#endif /* _ASM_MIPS_KVM_TYPES_H */
diff --git a/arch/powerpc/include/asm/kvm_types.h 
b/arch/powerpc/include/asm/kvm_types.h
new file mode 100644
index ..f627eceaa314
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_KVM_TYPES_H
+#define _ASM_POWERPC_KVM_TYPES_H
+
+#endif /* _ASM_POWERPC_KVM_TYPES_H */
diff --git a/arch/s390/include/asm/kvm_types.h 
b/arch/s390/include/asm/kvm_types.h
new file mode 100644
index ..b66a81f8a354
--- /dev/null
+++ b/arch/s390/include/asm/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_S390_KVM_TYPES_H
+#define _ASM_S390_KVM_TYPES_H
+
+#endif /* _ASM_S390_KVM_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4660ce90de7f..5c12cba8c2b1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -179,8 +179,6 @@ enum {
 
 #include 
 
-#define KVM_NR_MEM_OBJS 40
-
 #define KVM_NR_DB_REGS 4
 
 #define DR6_BD (1 << 13)
@@ -238,15 +236,6 @@ enum {
 
 struct kvm_kernel_irq_routing_entry;
 
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 /*
  * the pages used as guest page table on soft mmu are tracked by
  * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
@@ -600,9 +589,9 @@ struct kvm_vcpu_arch {
 */
struct kvm_mmu *walk_mmu;
 
-  

[PATCH 0/3] KVM: Unify mmu_memory_cache functionality across architectures

2019-01-28 Thread Christoffer Dall
We currently have duplicated functionality for the mmu_memory_cache used
to pre-allocate memory for the page table manipulation code which cannot
allocate memory while holding spinlocks.  This functionality is
duplicated across x86, arm/arm64, and mips.

There were recently a debate of modifying the arm code to be more in
line with the x86 code and some discussions around changing the page
flags used for allocation.  This series should make it easier to take a
uniform approach across architectures.

While there's not a huge amount of code sharing, we come out with a net
gain.

Only tested on arm/arm64, and only compile-tested on x86 and mips.

Christoffer Dall (3):
  KVM: x86: Move mmu_memory_cache functions to common code
  KVM: arm/arm64: Move to common kvm_mmu_memcache infrastructure
  KVM: mips: Move to common kvm_mmu_memcache infrastructure

 arch/arm/include/asm/kvm_host.h  | 13 +---
 arch/arm/include/asm/kvm_mmu.h   |  2 +-
 arch/arm/include/asm/kvm_types.h | 10 +++
 arch/arm64/include/asm/kvm_host.h| 13 +---
 arch/arm64/include/asm/kvm_mmu.h |  2 +-
 arch/arm64/include/asm/kvm_types.h   | 11 
 arch/mips/include/asm/kvm_host.h | 15 +
 arch/mips/include/asm/kvm_types.h| 10 +++
 arch/mips/kvm/mips.c |  2 +-
 arch/mips/kvm/mmu.c  | 54 +++-
 arch/powerpc/include/asm/kvm_types.h |  5 ++
 arch/s390/include/asm/kvm_types.h|  5 ++
 arch/x86/include/asm/kvm_host.h  | 17 +
 arch/x86/include/asm/kvm_types.h | 10 +++
 arch/x86/kvm/mmu.c   | 97 ++--
 arch/x86/kvm/paging_tmpl.h   |  4 +-
 include/linux/kvm_host.h |  9 +++
 include/linux/kvm_types.h| 12 
 virt/kvm/arm/arm.c   |  2 +-
 virt/kvm/arm/mmu.c   | 68 +--
 virt/kvm/kvm_main.c  | 58 +
 21 files changed, 189 insertions(+), 230 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_types.h
 create mode 100644 arch/arm64/include/asm/kvm_types.h
 create mode 100644 arch/mips/include/asm/kvm_types.h
 create mode 100644 arch/powerpc/include/asm/kvm_types.h
 create mode 100644 arch/s390/include/asm/kvm_types.h
 create mode 100644 arch/x86/include/asm/kvm_types.h

-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm/arm64: trivial header path sanitization

2019-01-25 Thread Christoffer Dall
On Fri, Jan 25, 2019 at 04:57:27PM +0900, Masahiro Yamada wrote:
> My main motivation is to get rid of crappy header search path manipulation
> from Kbuild core.
> 
> Before that, I want to finish as many cleanup works as possible.
> 
> If you are interested in the big picture of this work,
> the full patch set is available at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git 
> build-test

Changes appear fine to me:

Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 2/5] arm/arm64: KVM: Allow a VCPU to fully reset itself

2019-01-25 Thread Christoffer Dall
From: Marc Zyngier 

The current kvm_psci_vcpu_on implementation will directly try to
manipulate the state of the VCPU to reset it.  However, since this is
not done on the thread that runs the VCPU, we can end up in a strangely
corrupted state when the source and target VCPUs are running at the same
time.

Fix this by factoring out all reset logic from the PSCI implementation
and forwarding the required information along with a request to the
target VCPU.

Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h   | 10 +
 arch/arm/kvm/reset.c  | 24 +
 arch/arm64/include/asm/kvm_host.h | 11 ++
 arch/arm64/kvm/reset.c| 24 +
 virt/kvm/arm/arm.c| 10 +
 virt/kvm/arm/psci.c   | 36 ++-
 6 files changed, 95 insertions(+), 20 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ca56537b61bc..50e89869178a 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -48,6 +48,7 @@
 #define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
@@ -147,6 +148,13 @@ struct kvm_cpu_context {
 
 typedef struct kvm_cpu_context kvm_cpu_context_t;
 
+struct vcpu_reset_state {
+   unsigned long   pc;
+   unsigned long   r0;
+   boolbe;
+   boolreset;
+};
+
 struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt;
 
@@ -186,6 +194,8 @@ struct kvm_vcpu_arch {
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
 
+   struct vcpu_reset_state reset_state;
+
/* Detect first run of a vcpu */
bool has_run_once;
 };
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
index 5ed0c3ee33d6..de41255eebcd 100644
--- a/arch/arm/kvm/reset.c
+++ b/arch/arm/kvm/reset.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -69,6 +70,29 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
/* Reset CP15 registers */
kvm_reset_coprocs(vcpu);
 
+   /*
+* Additional reset state handling that PSCI may have imposed on us.
+* Must be done after all the sys_reg reset.
+*/
+   if (vcpu->arch.reset_state.reset) {
+   unsigned long target_pc = vcpu->arch.reset_state.pc;
+
+   /* Gracefully handle Thumb2 entry point */
+   if (target_pc & 1) {
+   target_pc &= ~1UL;
+   vcpu_set_thumb(vcpu);
+   }
+
+   /* Propagate caller endianness */
+   if (vcpu->arch.reset_state.be)
+   kvm_vcpu_set_be(vcpu);
+
+   *vcpu_pc(vcpu) = target_pc;
+   vcpu_set_reg(vcpu, 0, vcpu->arch.reset_state.r0);
+
+   vcpu->arch.reset_state.reset = false;
+   }
+
/* Reset arch_timer context */
return kvm_timer_vcpu_reset(vcpu);
 }
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 7732d0ba4e60..da3fc7324d68 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -48,6 +48,7 @@
 #define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
@@ -208,6 +209,13 @@ struct kvm_cpu_context {
 
 typedef struct kvm_cpu_context kvm_cpu_context_t;
 
+struct vcpu_reset_state {
+   unsigned long   pc;
+   unsigned long   r0;
+   boolbe;
+   boolreset;
+};
+
 struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt;
 
@@ -297,6 +305,9 @@ struct kvm_vcpu_arch {
/* Virtual SError ESR to restore when HCR_EL2.VSE is set */
u64 vsesr_el2;
 
+   /* Additional reset state */
+   struct vcpu_reset_state reset_state;
+
/* True when deferrable sysregs are loaded on the physical CPU,
 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
bool sysregs_loaded_on_cpu;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index f21a2a575939..f16a5f8ff2b4 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Maximum phys_shift supported for any VM on this host */
@@ -146,6 +147,29 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
/* Reset system registers */
kvm_reset_sys_regs(vcpu);
 
+   /*
+* Additional reset state handling that PSCI may have imposed on us.
+* Must be done after all the sys_reg reset.

[PATCH 3/5] KVM: arm/arm64: Require VCPU threads to turn them self off

2019-01-25 Thread Christoffer Dall
To avoid a race between turning VCPUs off and turning them on, make sure
that only the VCPU threat itself turns off the VCPU.  When other threads
want to turn of a VCPU, they now do this via a request.

Signed-off-by: Christoffer Dall 
Acked-by: Marc Zyngier 
---
 arch/arm/include/asm/kvm_host.h   |  2 ++
 arch/arm64/include/asm/kvm_host.h |  2 ++
 virt/kvm/arm/arm.c|  8 ++--
 virt/kvm/arm/psci.c   | 11 ++-
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 50e89869178a..b1cfae222441 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -49,6 +49,8 @@
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
 #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
+#define KVM_REQ_VCPU_OFF \
+   KVM_ARCH_REQ_FLAGS(3, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index da3fc7324d68..d43b13421987 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -49,6 +49,8 @@
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1)
 #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
+#define KVM_REQ_VCPU_OFF \
+   KVM_ARCH_REQ_FLAGS(3, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 9c486fad3f9f..785076176814 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -404,8 +404,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 static void vcpu_power_off(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.power_off = true;
-   kvm_make_request(KVM_REQ_SLEEP, vcpu);
+   kvm_make_request(KVM_REQ_VCPU_OFF, vcpu);
kvm_vcpu_kick(vcpu);
 }
 
@@ -646,6 +645,11 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
vcpu_req_sleep(vcpu);
 
+   if (kvm_check_request(KVM_REQ_VCPU_OFF, vcpu)) {
+   vcpu->arch.power_off = true;
+   vcpu_req_sleep(vcpu);
+   }
+
if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
kvm_reset_vcpu(vcpu);
 
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index b9cff1d4b06d..20255319e193 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -97,9 +97,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu 
*vcpu)
 
 static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.power_off = true;
-   kvm_make_request(KVM_REQ_SLEEP, vcpu);
-   kvm_vcpu_kick(vcpu);
+   kvm_make_request(KVM_REQ_VCPU_OFF, vcpu);
 }
 
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
@@ -198,9 +196,6 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct 
kvm_vcpu *vcpu)
 
 static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 {
-   int i;
-   struct kvm_vcpu *tmp;
-
/*
 * The KVM ABI specifies that a system event exit may call KVM_RUN
 * again and may perform shutdown/reboot at a later time that when the
@@ -210,9 +205,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, 
u32 type)
 * after this call is handled and before the VCPUs have been
 * re-initialized.
 */
-   kvm_for_each_vcpu(i, tmp, vcpu->kvm)
-   tmp->arch.power_off = true;
-   kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
+   kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_VCPU_OFF);
 
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
vcpu->run->system_event.type = type;
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 4/5] KVM: arm/arm64: Implement PSCI ON_PENDING when turning on VCPUs

2019-01-25 Thread Christoffer Dall
We are currently not implementing the PSCI spec completely, as we do not
take handle the situation where two VCPUs are attempting to turn on a
third VCPU at the same time.  The PSCI implementation should make sure
that only one requesting VCPU wins the race and that the other receives
PSCI_RET_ON_PENDING.

Implement this by changing the VCPU power state to a tristate enum and
ensure only a single VCPU can turn on another VCPU at a given time using
a cmpxchg operation.

Signed-off-by: Christoffer Dall 
Acked-by: Marc Zyngier 
---
 arch/arm/include/asm/kvm_host.h   | 10 --
 arch/arm64/include/asm/kvm_host.h | 10 --
 virt/kvm/arm/arm.c| 24 +++-
 virt/kvm/arm/psci.c   | 21 ++---
 4 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index b1cfae222441..4dc47fea1ac8 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -157,6 +157,12 @@ struct vcpu_reset_state {
boolreset;
 };
 
+enum vcpu_power_state {
+   KVM_ARM_VCPU_OFF,
+   KVM_ARM_VCPU_ON_PENDING,
+   KVM_ARM_VCPU_ON,
+};
+
 struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt;
 
@@ -184,8 +190,8 @@ struct kvm_vcpu_arch {
 * here.
 */
 
-   /* vcpu power-off state */
-   bool power_off;
+   /* vcpu power state */
+   enum vcpu_power_state power_state;
 
 /* Don't run the guest (internal implementation need) */
bool pause;
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index d43b13421987..0647a409657b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -218,6 +218,12 @@ struct vcpu_reset_state {
boolreset;
 };
 
+enum vcpu_power_state {
+   KVM_ARM_VCPU_OFF,
+   KVM_ARM_VCPU_ON_PENDING,
+   KVM_ARM_VCPU_ON,
+};
+
 struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt;
 
@@ -285,8 +291,8 @@ struct kvm_vcpu_arch {
u32 mdscr_el1;
} guest_debug_preserved;
 
-   /* vcpu power-off state */
-   bool power_off;
+   /* vcpu power state */
+   enum vcpu_power_state power_state;
 
/* Don't run the guest (internal implementation need) */
bool pause;
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 785076176814..1e3195155860 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -411,7 +411,7 @@ static void vcpu_power_off(struct kvm_vcpu *vcpu)
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
 {
-   if (vcpu->arch.power_off)
+   if (vcpu->arch.power_state != KVM_ARM_VCPU_ON)
mp_state->mp_state = KVM_MP_STATE_STOPPED;
else
mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
@@ -426,7 +426,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 
switch (mp_state->mp_state) {
case KVM_MP_STATE_RUNNABLE:
-   vcpu->arch.power_off = false;
+   vcpu->arch.power_state = KVM_ARM_VCPU_ON;
break;
case KVM_MP_STATE_STOPPED:
vcpu_power_off(vcpu);
@@ -448,8 +448,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
-   return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
-   && !v->arch.power_off && !v->arch.pause);
+   return (irq_lines || kvm_vgic_vcpu_pending_irq(v)) &&
+   v->arch.power_state == KVM_ARM_VCPU_ON &&
+   !v->arch.pause;
 }
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
@@ -614,14 +615,19 @@ void kvm_arm_resume_guest(struct kvm *kvm)
}
 }
 
+static bool vcpu_sleeping(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.power_state != KVM_ARM_VCPU_ON ||
+   vcpu->arch.pause;
+}
+
 static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 {
struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
-   swait_event_interruptible_exclusive(*wq, ((!vcpu->arch.power_off) &&
-  (!vcpu->arch.pause)));
+   swait_event_interruptible_exclusive(*wq, !vcpu_sleeping(vcpu));
 
-   if (vcpu->arch.power_off || vcpu->arch.pause) {
+   if (vcpu_sleeping(vcpu)) {
/* Awaken to handle a signal, request we sleep again later. */
kvm_make_request(KVM_REQ_SLEEP, vcpu);
}
@@ -646,7 +652,7 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
vcpu_req_sleep(vcpu);
 
if (kvm_check_request(KVM_REQ_VCPU_OFF, vcpu)) {
-   vcpu->arch.power_off = true;
+   vcpu->arch.power

[PATCH 5/5] arm/arm64: KVM: Don't panic on failure to properly reset system registers

2019-01-25 Thread Christoffer Dall
From: Marc Zyngier 

Failing to properly reset system registers is pretty bad. But not
quite as bad as bringing the whole machine down... So warn loudly,
but slightly more gracefully.

Signed-off-by: Marc Zyngier 
Acked-by: Christoffer Dall 
---
 arch/arm/kvm/coproc.c | 4 ++--
 arch/arm64/kvm/sys_regs.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 222c1635bc7a..e8bd288fd5be 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -1450,6 +1450,6 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
reset_coproc_regs(vcpu, table, num);
 
for (num = 1; num < NR_CP15_REGS; num++)
-   if (vcpu_cp15(vcpu, num) == 0x42424242)
-   panic("Didn't reset vcpu_cp15(vcpu, %zi)", num);
+   WARN(vcpu_cp15(vcpu, num) == 0x42424242,
+"Didn't reset vcpu_cp15(vcpu, %zi)", num);
 }
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 86096774abcd..4f067545c7d2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2609,6 +2609,6 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
reset_sys_reg_descs(vcpu, table, num);
 
for (num = 1; num < NR_SYS_REGS; num++)
-   if (__vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
-   panic("Didn't reset __vcpu_sys_reg(%zi)", num);
+   WARN(__vcpu_sys_reg(vcpu, num) == 0x4242424242424242,
+"Didn't reset __vcpu_sys_reg(%zi)\n", num);
 }
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 1/5] KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded

2019-01-25 Thread Christoffer Dall
Resetting the VCPU state modifies the system register state in memory,
but this may interact with vcpu_load/vcpu_put if running with preemption
disabled, which in turn may lead to corrupted system register state.

Address this by disabling preemption and doing put/load if required
around the reset logic.

Signed-off-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/reset.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b72a3dd56204..f21a2a575939 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -105,16 +105,33 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, 
long ext)
  * This function finds the right table above and sets the registers on
  * the virtual CPU struct to their architecturally defined reset
  * values.
+ *
+ * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
+ * ioctl or as part of handling a request issued by another VCPU in the PSCI
+ * handling code.  In the first case, the VCPU will not be loaded, and in the
+ * second case the VCPU will be loaded.  Because this function operates purely
+ * on the memory-backed valus of system registers, we want to do a full put if
+ * we were loaded (handling a request) and load the values back at the end of
+ * the function.  Otherwise we leave the state alone.  In both cases, we
+ * disable preemption around the vcpu reset as we would otherwise race with
+ * preempt notifiers which also call put/load.
  */
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 {
const struct kvm_regs *cpu_reset;
+   int ret = -EINVAL;
+   bool loaded;
+
+   preempt_disable();
+   loaded = (vcpu->cpu != -1);
+   if (loaded)
+   kvm_arch_vcpu_put(vcpu);
 
switch (vcpu->arch.target) {
default:
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
if (!cpu_has_32bit_el1())
-   return -EINVAL;
+   goto out;
cpu_reset = &default_regs_reset32;
} else {
cpu_reset = &default_regs_reset;
@@ -137,7 +154,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
 
/* Reset timer */
-   return kvm_timer_vcpu_reset(vcpu);
+   ret = kvm_timer_vcpu_reset(vcpu);
+out:
+   if (loaded)
+   kvm_arch_vcpu_load(vcpu, smp_processor_id());
+   preempt_enable();
+   return ret;
 }
 
 void kvm_set_ipa_limit(void)
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 0/5] KVM: arm/arm64: Fix VCPU power management problems

2019-01-25 Thread Christoffer Dall
This series fixes a number of issues:

 - When powering on and resetting VCPUs, we can be preempted in the
   middle which can lead to guest system register corruption.

 - We were missing support for PSCI ON_PENDING when multiple VCPUs try
   to turn on a target VCPU at the same time.

 - Powering off a VCPU could race with powering on the same VCPU.

 - We unnecesarily panic'ed if we found a non-initialized guest system
   register.

The main approach to fixing all these problems is by using VCPU
requests.

See the individual patches for more details.

Christoffer Dall (3):
  KVM: arm/arm64: Reset the VCPU without preemption and vcpu state
loaded
  KVM: arm/arm64: Require VCPU threads to turn them self off
  KVM: arm/arm64: Implement PSCI ON_PENDING when turning on VCPUs

Marc Zyngier (2):
  arm/arm64: KVM: Allow a VCPU to fully reset itself
  arm/arm64: KVM: Don't panic on failure to properly reset system
registers

 arch/arm/include/asm/kvm_host.h   | 22 ++-
 arch/arm/kvm/coproc.c |  4 +-
 arch/arm/kvm/reset.c  | 24 
 arch/arm64/include/asm/kvm_host.h | 23 ++-
 arch/arm64/kvm/reset.c| 50 +++-
 arch/arm64/kvm/sys_regs.c |  4 +-
 virt/kvm/arm/arm.c| 40 ++-
 virt/kvm/arm/psci.c   | 64 +++
 8 files changed, 177 insertions(+), 54 deletions(-)

-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 05/14] arm/arm64: KVM: Statically configure the host's view of MPIDR

2019-01-24 Thread Christoffer Dall
From: Marc Zyngier 

We currently eagerly save/restore MPIDR. It turns out to be
slightly pointless:
- On the host, this value is known as soon as we're scheduled on a
  physical CPU
- In the guest, this value cannot change, as it is set by KVM
  (and this is a read-only register)

The result of the above is that we can perfectly avoid the eager
saving of MPIDR_EL1, and only keep the restore. We just have
to setup the host contexts appropriately at boot time.

Signed-off-by: Marc Zyngier 
Acked-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h   | 8 
 arch/arm/kvm/hyp/cp15-sr.c| 1 -
 arch/arm64/include/asm/kvm_host.h | 8 
 arch/arm64/kvm/hyp/sysreg-sr.c| 1 -
 virt/kvm/arm/arm.c| 1 +
 5 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 4b6193f2f0f6..43e343e00fb8 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -147,6 +148,13 @@ struct kvm_cpu_context {
 
 typedef struct kvm_cpu_context kvm_cpu_context_t;
 
+static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt,
+int cpu)
+{
+   /* The host's MPIDR is immutable, so let's set it up at boot time */
+   cpu_ctxt->cp15[c0_MPIDR] = cpu_logical_map(cpu);
+}
+
 struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt;
 
diff --git a/arch/arm/kvm/hyp/cp15-sr.c b/arch/arm/kvm/hyp/cp15-sr.c
index c4782812714c..8bf895ec6e04 100644
--- a/arch/arm/kvm/hyp/cp15-sr.c
+++ b/arch/arm/kvm/hyp/cp15-sr.c
@@ -27,7 +27,6 @@ static u64 *cp15_64(struct kvm_cpu_context *ctxt, int idx)
 
 void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
 {
-   ctxt->cp15[c0_MPIDR]= read_sysreg(VMPIDR);
ctxt->cp15[c0_CSSELR]   = read_sysreg(CSSELR);
ctxt->cp15[c1_SCTLR]= read_sysreg(SCTLR);
ctxt->cp15[c1_CPACR]= read_sysreg(CPACR);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 8b7702bdb219..f497bb31031f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -418,6 +419,13 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, 
unsigned long mpidr);
 
 DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
 
+static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt,
+int cpu)
+{
+   /* The host's MPIDR is immutable, so let's set it up at boot time */
+   cpu_ctxt->sys_regs[MPIDR_EL1] = cpu_logical_map(cpu);
+}
+
 void __kvm_enable_ssbs(void);
 
 static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 68d6f7c3b237..2498f86defcb 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -52,7 +52,6 @@ static void __hyp_text __sysreg_save_user_state(struct 
kvm_cpu_context *ctxt)
 
 static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
-   ctxt->sys_regs[MPIDR_EL1]   = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1]  = read_sysreg(csselr_el1);
ctxt->sys_regs[SCTLR_EL1]   = read_sysreg_el1(sctlr);
ctxt->sys_regs[ACTLR_EL1]   = read_sysreg(actlr_el1);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 4d55f98f97f7..3dd240ea9e76 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1561,6 +1561,7 @@ static int init_hyp_mode(void)
kvm_cpu_context_t *cpu_ctxt;
 
cpu_ctxt = per_cpu_ptr(&kvm_host_cpu_state, cpu);
+   kvm_init_host_cpu_context(cpu_ctxt, cpu);
err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP);
 
if (err) {
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 01/14] arm/arm64: KVM: Introduce kvm_call_hyp_ret()

2019-01-24 Thread Christoffer Dall
From: Marc Zyngier 

Until now, we haven't differentiated between HYP calls that
have a return value and those who don't. As we're about to
change this, introduce kvm_call_hyp_ret(), and change all
call sites that actually make use of a return value.

Signed-off-by: Marc Zyngier 
Acked-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h   | 3 +++
 arch/arm64/include/asm/kvm_host.h | 1 +
 arch/arm64/kvm/debug.c| 2 +-
 virt/kvm/arm/arm.c| 2 +-
 virt/kvm/arm/vgic/vgic-v3.c   | 4 ++--
 5 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ca56537b61bc..023c9f2b1eea 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -214,7 +214,10 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+
 unsigned long kvm_call_hyp(void *hypfn, ...);
+#define kvm_call_hyp_ret(f, ...) kvm_call_hyp(f, ##__VA_ARGS__)
+
 void force_vm_exit(const cpumask_t *mask);
 int __kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
  struct kvm_vcpu_events *events);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 7732d0ba4e60..e54cb7c88a4e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -371,6 +371,7 @@ void kvm_arm_resume_guest(struct kvm *kvm);
 
 u64 __kvm_call_hyp(void *hypfn, ...);
 #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
+#define kvm_call_hyp_ret(f, ...) kvm_call_hyp(f, ##__VA_ARGS__)
 
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index f39801e4136c..fd917d6d12af 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -76,7 +76,7 @@ static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
 
 void kvm_arm_init_debug(void)
 {
-   __this_cpu_write(mdcr_el2, kvm_call_hyp(__kvm_get_mdcr_el2));
+   __this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
 }
 
 /**
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 9e350fd34504..4d55f98f97f7 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -765,7 +765,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
ret = kvm_vcpu_run_vhe(vcpu);
kvm_arm_vhe_guest_exit();
} else {
-   ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
+   ret = kvm_call_hyp_ret(__kvm_vcpu_run_nvhe, vcpu);
}
 
vcpu->mode = OUTSIDE_GUEST_MODE;
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 9c0dd234ebe8..67f98151c88d 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -589,7 +589,7 @@ early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
  */
 int vgic_v3_probe(const struct gic_kvm_info *info)
 {
-   u32 ich_vtr_el2 = kvm_call_hyp(__vgic_v3_get_ich_vtr_el2);
+   u32 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_ich_vtr_el2);
int ret;
 
/*
@@ -679,7 +679,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
if (likely(cpu_if->vgic_sre))
-   cpu_if->vgic_vmcr = kvm_call_hyp(__vgic_v3_read_vmcr);
+   cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
 
kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
 
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 12/14] KVM: arm/arm64: arch_timer: Assign the phys timer on VHE systems

2019-01-24 Thread Christoffer Dall
VHE systems don't have to emulate the physical timer, we can simply
assigne the EL1 physical timer directly to the VM as the host always
uses the EL2 timers.

In order to minimize the amount of cruft, AArch32 gets definitions for
the physical timer too, but is should be generally unused on this
architecture.

Co-written with Marc Zyngier 

Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_hyp.h |   4 +
 include/kvm/arm_arch_timer.h   |   6 +
 virt/kvm/arm/arch_timer.c  | 206 ++---
 3 files changed, 171 insertions(+), 45 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index e93a0cac9add..87bcd18df8d5 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -40,6 +40,7 @@
 #define TTBR1  __ACCESS_CP15_64(1, c2)
 #define VTTBR  __ACCESS_CP15_64(6, c2)
 #define PAR__ACCESS_CP15_64(0, c7)
+#define CNTP_CVAL  __ACCESS_CP15_64(2, c14)
 #define CNTV_CVAL  __ACCESS_CP15_64(3, c14)
 #define CNTVOFF__ACCESS_CP15_64(4, c14)
 
@@ -85,6 +86,7 @@
 #define TID_PRIV   __ACCESS_CP15(c13, 0, c0, 4)
 #define HTPIDR __ACCESS_CP15(c13, 4, c0, 2)
 #define CNTKCTL__ACCESS_CP15(c14, 0, c1, 0)
+#define CNTP_CTL   __ACCESS_CP15(c14, 0, c2, 1)
 #define CNTV_CTL   __ACCESS_CP15(c14, 0, c3, 1)
 #define CNTHCTL__ACCESS_CP15(c14, 4, c1, 0)
 
@@ -94,6 +96,8 @@
 #define read_sysreg_el0(r) read_sysreg(r##_el0)
 #define write_sysreg_el0(v, r) write_sysreg(v, r##_el0)
 
+#define cntp_ctl_el0   CNTP_CTL
+#define cntp_cval_el0  CNTP_CVAL
 #define cntv_ctl_el0   CNTV_CTL
 #define cntv_cval_el0  CNTV_CVAL
 #define cntvoff_el2CNTVOFF
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index d40fe57a2d0d..722e0481f310 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -50,6 +50,10 @@ struct arch_timer_context {
 
/* Emulated Timer (may be unused) */
struct hrtimer  hrtimer;
+
+   /* Duplicated state from arch_timer.c for convenience */
+   u32 host_timer_irq;
+   u32 host_timer_irq_flags;
 };
 
 enum loaded_timer_state {
@@ -107,6 +111,8 @@ bool kvm_arch_timer_get_input_level(int vintid);
 #define vcpu_vtimer(v) (&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
 #define vcpu_ptimer(v) (&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
 
+#define arch_timer_ctx_index(ctx)  ((ctx) - 
vcpu_timer((ctx)->vcpu)->timers)
+
 u64 kvm_arm_timer_read_sysreg(struct kvm_vcpu *vcpu,
  enum kvm_arch_timers tmr,
  enum kvm_arch_timer_regs treg);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 8b0eca5fbad1..eed8f48fbf9b 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -35,7 +35,9 @@
 
 static struct timecounter *timecounter;
 static unsigned int host_vtimer_irq;
+static unsigned int host_ptimer_irq;
 static u32 host_vtimer_irq_flags;
+static u32 host_ptimer_irq_flags;
 
 static DEFINE_STATIC_KEY_FALSE(has_gic_active_state);
 
@@ -86,20 +88,24 @@ static void soft_timer_cancel(struct hrtimer *hrt)
 static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
 {
struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
-   struct arch_timer_context *vtimer;
+   struct arch_timer_context *ctx;
 
/*
 * We may see a timer interrupt after vcpu_put() has been called which
 * sets the CPU's vcpu pointer to NULL, because even though the timer
-* has been disabled in vtimer_save_state(), the hardware interrupt
+* has been disabled in timer_save_state(), the hardware interrupt
 * signal may not have been retired from the interrupt controller yet.
 */
if (!vcpu)
return IRQ_HANDLED;
 
-   vtimer = vcpu_vtimer(vcpu);
-   if (kvm_timer_should_fire(vtimer))
-   kvm_timer_update_irq(vcpu, true, vtimer);
+   if (irq == host_vtimer_irq)
+   ctx = vcpu_vtimer(vcpu);
+   else
+   ctx = vcpu_ptimer(vcpu);
+
+   if (kvm_timer_should_fire(ctx))
+   kvm_timer_update_irq(vcpu, true, ctx);
 
if (userspace_irqchip(vcpu->kvm) &&
!static_branch_unlikely(&has_gic_active_state))
@@ -208,13 +214,25 @@ static enum hrtimer_restart kvm_phys_timer_expire(struct 
hrtimer *hrt)
 static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 {
struct arch_timer_cpu *timer = vcpu_timer(timer_ctx->vcpu);
+   enum kvm_arch_timers index = arch_timer_ctx_index(timer_ctx);
u64 cval, now;
 
if (timer->loaded == TIMER_EL1_LOADED) {
-

[PATCH 03/14] arm64: KVM: Drop VHE-specific HYP call stub

2019-01-24 Thread Christoffer Dall
From: Marc Zyngier 

We now call VHE code directly, without going through any central
dispatching function. Let's drop that code.

Signed-off-by: Marc Zyngier 
Acked-by: Christoffer Dall 
---
 arch/arm64/kvm/hyp.S   |  3 ---
 arch/arm64/kvm/hyp/hyp-entry.S | 12 
 2 files changed, 15 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 952f6cb9cf72..2845aa680841 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -40,9 +40,6 @@
  * arch/arm64/kernel/hyp_stub.S.
  */
 ENTRY(__kvm_call_hyp)
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
hvc #0
ret
-alternative_else_nop_endif
-   b   __vhe_hyp_call
 ENDPROC(__kvm_call_hyp)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 73c1b483ec39..2b1e686772bf 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -43,18 +43,6 @@
ldr lr, [sp], #16
 .endm
 
-ENTRY(__vhe_hyp_call)
-   do_el2_call
-   /*
-* We used to rely on having an exception return to get
-* an implicit isb. In the E2H case, we don't have it anymore.
-* rather than changing all the leaf functions, just do it here
-* before returning to the rest of the kernel.
-*/
-   isb
-   ret
-ENDPROC(__vhe_hyp_call)
-
 el1_sync:  // Guest trapped into EL2
 
mrs x0, esr_el2
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 13/14] KVM: arm/arm64: Rework the timer code to use a timer_map

2019-01-24 Thread Christoffer Dall
We are currently emulating two timers in two different ways.  When we
add support for nested virtualization in the future, we are going to be
emulating either two timers in two diffferent ways, or four timers in a
single way.

We need a unified data structure to keep track of how we map virtual
state to physical state and we need to cleanup some of the timer code to
operate more independently on a struct arch_timer_context instead of
trying to consider the global state of the VCPU and recomputing all
state.

Co-written with Marc Zyngier 

Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 include/kvm/arm_arch_timer.h |  22 +--
 virt/kvm/arm/arch_timer.c| 295 +++
 virt/kvm/arm/trace.h | 105 +
 3 files changed, 276 insertions(+), 146 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 722e0481f310..05a18dd265b5 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -51,14 +51,22 @@ struct arch_timer_context {
/* Emulated Timer (may be unused) */
struct hrtimer  hrtimer;
 
+   /*
+* We have multiple paths which can save/restore the timer state onto
+* the hardware, so we need some way of keeping track of where the
+* latest state is.
+*/
+   boolloaded;
+
/* Duplicated state from arch_timer.c for convenience */
u32 host_timer_irq;
u32 host_timer_irq_flags;
 };
 
-enum loaded_timer_state {
-   TIMER_NOT_LOADED,
-   TIMER_EL1_LOADED,
+struct timer_map {
+   struct arch_timer_context *direct_vtimer;
+   struct arch_timer_context *direct_ptimer;
+   struct arch_timer_context *emul_ptimer;
 };
 
 struct arch_timer_cpu {
@@ -69,14 +77,6 @@ struct arch_timer_cpu {
 
/* Is the timer enabled */
boolenabled;
-
-   /*
-* We have multiple paths which can save/restore the timer state
-* onto the hardware, and for nested virt the EL1 hardware timers can
-* contain state from either the VM's EL1 timers or EL2 timers, so we
-* need some way of keeping track of where the latest state is.
-*/
-   enum loaded_timer_state loaded;
 };
 
 int kvm_timer_hyp_init(bool);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index eed8f48fbf9b..03d29f607355 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -68,6 +68,21 @@ u64 kvm_phys_timer_read(void)
return timecounter->cc->read(timecounter->cc);
 }
 
+static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
+{
+   if (has_vhe()) {
+   map->direct_vtimer = vcpu_vtimer(vcpu);
+   map->direct_ptimer = vcpu_ptimer(vcpu);
+   map->emul_ptimer = NULL;
+   } else {
+   map->direct_vtimer = vcpu_vtimer(vcpu);
+   map->direct_ptimer = NULL;
+   map->emul_ptimer = vcpu_ptimer(vcpu);
+   }
+
+   trace_kvm_get_timer_map(vcpu->vcpu_id, map);
+}
+
 static inline bool userspace_irqchip(struct kvm *kvm)
 {
return static_branch_unlikely(&userspace_irqchip_in_use) &&
@@ -89,6 +104,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void 
*dev_id)
 {
struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
struct arch_timer_context *ctx;
+   struct timer_map map;
 
/*
 * We may see a timer interrupt after vcpu_put() has been called which
@@ -99,10 +115,12 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void 
*dev_id)
if (!vcpu)
return IRQ_HANDLED;
 
+   get_timer_map(vcpu, &map);
+
if (irq == host_vtimer_irq)
-   ctx = vcpu_vtimer(vcpu);
+   ctx = map.direct_vtimer;
else
-   ctx = vcpu_ptimer(vcpu);
+   ctx = map.direct_ptimer;
 
if (kvm_timer_should_fire(ctx))
kvm_timer_update_irq(vcpu, true, ctx);
@@ -136,7 +154,9 @@ static u64 kvm_timer_compute_delta(struct 
arch_timer_context *timer_ctx)
 
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
 {
-   return !(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
+   WARN_ON(timer_ctx && timer_ctx->loaded);
+   return timer_ctx &&
+  !(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
 }
 
@@ -146,21 +166,22 @@ static bool kvm_timer_irq_can_fire(struct 
arch_timer_context *timer_ctx)
  */
 static u64 kvm_timer_earliest_exp(struct kvm_vcpu *vcpu)
 {
-   u64 min_virt = ULLONG_MAX, min_phys = ULLONG_MAX;
-   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
-   struct arch_timer_context *ptimer = vcpu_p

[PATCH 06/14] KVM: arm/arm64: Factor out VMID into struct kvm_vmid

2019-01-24 Thread Christoffer Dall
In preparation for nested virtualization where we are going to have more
than a single VMID per VM, let's factor out the VMID data into a
separate VMID data structure and change the VMID allocator to operate on
this new structure instead of using a struct kvm.

This also means that udate_vttbr now becomes update_vmid, and that the
vttbr itself is generated on the fly based on the stage 2 page table
base address and the vmid.

We cache the physical address of the pgd when allocating the pgd to
avoid doing the calculation on every entry to the guest and to avoid
calling into potentially non-hyp-mapped code from hyp/EL2.

If we wanted to merge the VMID allocator with the arm64 ASID allocator
at some point in the future, it should actually become easier to do that
after this patch.

Note that to avoid mapping the kvm_vmid_bits variable into hyp, we
simply forego the masking of the vmid value in kvm_get_vttbr and rely on
update_vmid to always assign a valid vmid value (within the supported
range).

Signed-off-by: Christoffer Dall 
Reviewed-by: Marc Zyngier 
---
 arch/arm/include/asm/kvm_host.h   | 13 ---
 arch/arm/include/asm/kvm_mmu.h| 11 ++
 arch/arm/kvm/hyp/switch.c |  2 +-
 arch/arm/kvm/hyp/tlb.c|  4 +--
 arch/arm64/include/asm/kvm_host.h |  9 +++--
 arch/arm64/include/asm/kvm_hyp.h  |  3 +-
 arch/arm64/include/asm/kvm_mmu.h  | 11 ++
 virt/kvm/arm/arm.c| 57 +++
 virt/kvm/arm/mmu.c|  2 ++
 9 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 43e343e00fb8..8073267dc4a0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -57,10 +57,13 @@ int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
-struct kvm_arch {
-   /* VTTBR value associated with below pgd and vmid */
-   u64vttbr;
+struct kvm_vmid {
+   /* The VMID generation used for the virt. memory system */
+   u64vmid_gen;
+   u32vmid;
+};
 
+struct kvm_arch {
/* The last vcpu id that ran on each physical CPU */
int __percpu *last_vcpu_ran;
 
@@ -70,11 +73,11 @@ struct kvm_arch {
 */
 
/* The VMID generation used for the virt. memory system */
-   u64vmid_gen;
-   u32vmid;
+   struct kvm_vmid vmid;
 
/* Stage-2 page table */
pgd_t *pgd;
+   phys_addr_t pgd_phys;
 
/* Interrupt controller */
struct vgic_distvgic;
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 3a875fc1b63c..fadbd9ad3a90 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -426,6 +426,17 @@ static inline bool kvm_cpu_has_cnp(void)
return false;
 }
 
+static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
+{
+   struct kvm_vmid *vmid = &kvm->arch.vmid;
+   u64 vmid_field, baddr;
+   u64 cnp = kvm_cpu_has_cnp() ? VTTBR_CNP_BIT : 0;
+
+   baddr = kvm->arch.pgd_phys;
+   vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
+   return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
+}
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index acf1c37fa49c..3b058a5d7c5f 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -77,7 +77,7 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu 
*vcpu)
 static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
 {
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-   write_sysreg(kvm->arch.vttbr, VTTBR);
+   write_sysreg(kvm_get_vttbr(kvm), VTTBR);
write_sysreg(vcpu->arch.midr, VPIDR);
 }
 
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index c0edd450e104..8e4afba73635 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -41,7 +41,7 @@ void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
 
/* Switch to requested VMID */
kvm = kern_hyp_va(kvm);
-   write_sysreg(kvm->arch.vttbr, VTTBR);
+   write_sysreg(kvm_get_vttbr(kvm), VTTBR);
isb();
 
write_sysreg(0, TLBIALLIS);
@@ -61,7 +61,7 @@ void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu 
*vcpu)
struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
 
/* Switch to requested VMID */
-   write_sysreg(kvm->arch.vttbr, VTTBR);
+   write_sysreg(kvm_get_vttbr(kvm), VTTBR);
isb();
 
write_sysreg(0, TLBIALL);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index f497bb31031f..444dd1cb1958 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -57,16 +57,19 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 int kvm_arch_vm_ioctl_check_e

[PATCH 02/14] arm64: KVM: Allow for direct call of HYP functions when using VHE

2019-01-24 Thread Christoffer Dall
From: Marc Zyngier 

When running VHE, there is no need to jump via some stub to perform
a "HYP" function call, as there is a single address space.

Let's thus change kvm_call_hyp() and co to perform a direct call
in this case. Although this results in a bit of code expansion,
it allows the compiler to check for type compatibility, something
that we are missing so far.

Signed-off-by: Marc Zyngier 
Acked-by: Christoffer Dall 
---
 arch/arm64/include/asm/kvm_host.h | 32 +--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e54cb7c88a4e..8b7702bdb219 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -370,8 +370,36 @@ void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
 u64 __kvm_call_hyp(void *hypfn, ...);
-#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
-#define kvm_call_hyp_ret(f, ...) kvm_call_hyp(f, ##__VA_ARGS__)
+
+/*
+ * The couple of isb() below are there to guarantee the same behaviour
+ * on VHE as on !VHE, where the eret to EL1 acts as a context
+ * synchronization event.
+ */
+#define kvm_call_hyp(f, ...)   \
+   do {\
+   if (has_vhe()) {\
+   f(__VA_ARGS__); \
+   isb();  \
+   } else {\
+   __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__); \
+   }   \
+   } while(0)
+
+#define kvm_call_hyp_ret(f, ...)   \
+   ({  \
+   typeof(f(__VA_ARGS__)) ret; \
+   \
+   if (has_vhe()) {\
+   ret = f(__VA_ARGS__);   \
+   isb();  \
+   } else {\
+   ret = __kvm_call_hyp(kvm_ksym_ref(f),   \
+##__VA_ARGS__);\
+   }   \
+   \
+   ret;\
+   })
 
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 14/14] KVM: arm/arm64: Move kvm_is_write_fault to header file

2019-01-24 Thread Christoffer Dall
From: Christoffer Dall 

Move this little function to the header files for arm/arm64 so other
code can make use of it directly.

Signed-off-by: Christoffer Dall 
Acked-by: Marc Zyngier 
---
 arch/arm/include/asm/kvm_emulate.h   | 8 
 arch/arm64/include/asm/kvm_emulate.h | 8 
 virt/kvm/arm/mmu.c   | 8 
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index 77121b713bef..8927cae7c966 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -265,6 +265,14 @@ static inline bool kvm_vcpu_dabt_isextabt(struct kvm_vcpu 
*vcpu)
}
 }
 
+static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
+{
+   if (kvm_vcpu_trap_is_iabt(vcpu))
+   return false;
+
+   return kvm_vcpu_dabt_iswrite(vcpu);
+}
+
 static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
 {
return kvm_vcpu_get_hsr(vcpu) & HSR_HVC_IMM_MASK;
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 506386a3edde..a0d1ce9ae12b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -331,6 +331,14 @@ static inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu 
*vcpu)
return ESR_ELx_SYS64_ISS_RT(esr);
 }
 
+static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
+{
+   if (kvm_vcpu_trap_is_iabt(vcpu))
+   return false;
+
+   return kvm_vcpu_dabt_iswrite(vcpu);
+}
+
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index bffcbc423f4c..c17010bc33a9 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1398,14 +1398,6 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, 
phys_addr_t *ipap)
return false;
 }
 
-static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
-{
-   if (kvm_vcpu_trap_is_iabt(vcpu))
-   return false;
-
-   return kvm_vcpu_dabt_iswrite(vcpu);
-}
-
 /**
  * stage2_wp_ptes - write protect PMD range
  * @pmd:   pointer to pmd entry
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 08/14] KVM: arm64: Fix ICH_ELRSR_EL2 sysreg naming

2019-01-24 Thread Christoffer Dall
From: Marc Zyngier 

We previously incorrectly named the define for this system register.

Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/arch_gicv3.h | 4 ++--
 arch/arm64/include/asm/sysreg.h   | 2 +-
 virt/kvm/arm/hyp/vgic-v3-sr.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/arch_gicv3.h 
b/arch/arm/include/asm/arch_gicv3.h
index 0bd530702118..bdc87700def2 100644
--- a/arch/arm/include/asm/arch_gicv3.h
+++ b/arch/arm/include/asm/arch_gicv3.h
@@ -54,7 +54,7 @@
 #define ICH_VTR__ACCESS_CP15(c12, 4, c11, 1)
 #define ICH_MISR   __ACCESS_CP15(c12, 4, c11, 2)
 #define ICH_EISR   __ACCESS_CP15(c12, 4, c11, 3)
-#define ICH_ELSR   __ACCESS_CP15(c12, 4, c11, 5)
+#define ICH_ELRSR  __ACCESS_CP15(c12, 4, c11, 5)
 #define ICH_VMCR   __ACCESS_CP15(c12, 4, c11, 7)
 
 #define __LR0(x)   __ACCESS_CP15(c12, 4, c12, x)
@@ -151,7 +151,7 @@ CPUIF_MAP(ICH_HCR, ICH_HCR_EL2)
 CPUIF_MAP(ICH_VTR, ICH_VTR_EL2)
 CPUIF_MAP(ICH_MISR, ICH_MISR_EL2)
 CPUIF_MAP(ICH_EISR, ICH_EISR_EL2)
-CPUIF_MAP(ICH_ELSR, ICH_ELSR_EL2)
+CPUIF_MAP(ICH_ELRSR, ICH_ELRSR_EL2)
 CPUIF_MAP(ICH_VMCR, ICH_VMCR_EL2)
 CPUIF_MAP(ICH_AP0R3, ICH_AP0R3_EL2)
 CPUIF_MAP(ICH_AP0R2, ICH_AP0R2_EL2)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 72dc4c011014..3e5650903d6d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -426,7 +426,7 @@
 #define SYS_ICH_VTR_EL2sys_reg(3, 4, 12, 11, 1)
 #define SYS_ICH_MISR_EL2   sys_reg(3, 4, 12, 11, 2)
 #define SYS_ICH_EISR_EL2   sys_reg(3, 4, 12, 11, 3)
-#define SYS_ICH_ELSR_EL2   sys_reg(3, 4, 12, 11, 5)
+#define SYS_ICH_ELRSR_EL2  sys_reg(3, 4, 12, 11, 5)
 #define SYS_ICH_VMCR_EL2   sys_reg(3, 4, 12, 11, 7)
 
 #define __SYS__LR0_EL2(x)  sys_reg(3, 4, 12, 12, x)
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 9652c453480f..264d92da3240 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -226,7 +226,7 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
int i;
u32 elrsr;
 
-   elrsr = read_gicreg(ICH_ELSR_EL2);
+   elrsr = read_gicreg(ICH_ELRSR_EL2);
 
write_gicreg(cpu_if->vgic_hcr & ~ICH_HCR_EN, ICH_HCR_EL2);
 
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 11/14] KVM: arm/arm64: timer: Rework data structures for multiple timers

2019-01-24 Thread Christoffer Dall
Prepare for having 4 timer data structures (2 for now).

Change loaded to an enum so that we know not just whether *some* state
is loaded on the CPU, but also *which* state is loaded.

Move loaded to the cpu data structure and not the individual timer
structure, in preparation for assigning the EL1 phys timer as well.

Signed-off-by: Christoffer Dall 
Acked-by: Marc Zyngier 
---
 include/kvm/arm_arch_timer.h | 44 ++-
 virt/kvm/arm/arch_timer.c| 58 +++-
 2 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index d26b7fde9935..d40fe57a2d0d 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -36,6 +36,8 @@ enum kvm_arch_timer_regs {
 };
 
 struct arch_timer_context {
+   struct kvm_vcpu *vcpu;
+
/* Registers: control register, timer value */
u32 cnt_ctl;
u64 cnt_cval;
@@ -43,32 +45,34 @@ struct arch_timer_context {
/* Timer IRQ */
struct kvm_irq_levelirq;
 
-   /*
-* We have multiple paths which can save/restore the timer state
-* onto the hardware, so we need some way of keeping track of
-* where the latest state is.
-*
-* loaded == true:  State is loaded on the hardware registers.
-* loaded == false: State is stored in memory.
-*/
-   boolloaded;
-
/* Virtual offset */
-   u64 cntvoff;
+   u64 cntvoff;
+
+   /* Emulated Timer (may be unused) */
+   struct hrtimer  hrtimer;
+};
+
+enum loaded_timer_state {
+   TIMER_NOT_LOADED,
+   TIMER_EL1_LOADED,
 };
 
 struct arch_timer_cpu {
-   struct arch_timer_context   vtimer;
-   struct arch_timer_context   ptimer;
+   struct arch_timer_context timers[NR_KVM_TIMERS];
 
/* Background timer used when the guest is not running */
struct hrtimer  bg_timer;
 
-   /* Physical timer emulation */
-   struct hrtimer  phys_timer;
-
/* Is the timer enabled */
boolenabled;
+
+   /*
+* We have multiple paths which can save/restore the timer state
+* onto the hardware, and for nested virt the EL1 hardware timers can
+* contain state from either the VM's EL1 timers or EL2 timers, so we
+* need some way of keeping track of where the latest state is.
+*/
+   enum loaded_timer_state loaded;
 };
 
 int kvm_timer_hyp_init(bool);
@@ -98,10 +102,10 @@ void kvm_timer_init_vhe(void);
 
 bool kvm_arch_timer_get_input_level(int vintid);
 
-#define vcpu_vtimer(v) (&(v)->arch.timer_cpu.vtimer)
-#define vcpu_ptimer(v) (&(v)->arch.timer_cpu.ptimer)
-#define vcpu_get_timer(v,t)\
-   (t == TIMER_VTIMER ? vcpu_vtimer(v) : vcpu_ptimer(v))
+#define vcpu_timer(v)  (&(v)->arch.timer_cpu)
+#define vcpu_get_timer(v,t)(&vcpu_timer(v)->timers[(t)])
+#define vcpu_vtimer(v) (&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
+#define vcpu_ptimer(v) (&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
 
 u64 kvm_arm_timer_read_sysreg(struct kvm_vcpu *vcpu,
  enum kvm_arch_timers tmr,
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 9502bb91776b..8b0eca5fbad1 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -184,13 +184,11 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct 
hrtimer *hrt)
 static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
 {
struct arch_timer_context *ptimer;
-   struct arch_timer_cpu *timer;
struct kvm_vcpu *vcpu;
u64 ns;
 
-   timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
-   vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
-   ptimer = vcpu_ptimer(vcpu);
+   ptimer = container_of(hrt, struct arch_timer_context, hrtimer);
+   vcpu = ptimer->vcpu;
 
/*
 * Check that the timer has really expired from the guest's
@@ -209,9 +207,10 @@ static enum hrtimer_restart kvm_phys_timer_expire(struct 
hrtimer *hrt)
 
 static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 {
+   struct arch_timer_cpu *timer = vcpu_timer(timer_ctx->vcpu);
u64 cval, now;
 
-   if (timer_ctx->loaded) {
+   if (timer->loaded == TIMER_EL1_LOADED) {
u32 cnt_ctl;
 
/* Only the virtual timer can be loaded so far */
@@ -280,7 +279,6 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, 
bool new_level,
 /* Schedule the background timer for the emulated timer. */
 static void phys_timer_emulate(struct kvm_vcpu *vcpu)
 {
-   struct arch_timer_cpu *timer = 

[PATCH 10/14] KVM: arm/arm64: consolidate arch timer trap handlers

2019-01-24 Thread Christoffer Dall
From: Andre Przywara 

At the moment we have separate system register emulation handlers for
each timer register. Actually they are quite similar, and we rely on
kvm_arm_timer_[gs]et_reg() for the actual emulation anyways, so let's
just merge all of those handlers into one function, which just marshalls
the arguments and then hands off to a set of common accessors.
This makes extending the emulation to include EL2 timers much easier.

Signed-off-by: Andre Przywara 
[Fixed 32-bit VM breakage and reduced to reworking existing code]
Signed-off-by: Christoffer Dall 
[Fixed 32bit host, general cleanup]
Signed-off-by: Marc Zyngier 
---
 arch/arm/kvm/coproc.c   |  23 +++---
 arch/arm64/include/asm/sysreg.h |   4 +
 arch/arm64/kvm/sys_regs.c   |  80 +++-
 include/kvm/arm_arch_timer.h|  23 ++
 virt/kvm/arm/arch_timer.c   | 129 +++-
 5 files changed, 196 insertions(+), 63 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 222c1635bc7a..51863364f8d1 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -293,15 +293,16 @@ static bool access_cntp_tval(struct kvm_vcpu *vcpu,
 const struct coproc_params *p,
 const struct coproc_reg *r)
 {
-   u64 now = kvm_phys_timer_read();
-   u64 val;
+   u32 val;
 
if (p->is_write) {
val = *vcpu_reg(vcpu, p->Rt1);
-   kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL, val + now);
+   kvm_arm_timer_write_sysreg(vcpu,
+  TIMER_PTIMER, TIMER_REG_TVAL, val);
} else {
-   val = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
-   *vcpu_reg(vcpu, p->Rt1) = val - now;
+   val = kvm_arm_timer_read_sysreg(vcpu,
+   TIMER_PTIMER, TIMER_REG_TVAL);
+   *vcpu_reg(vcpu, p->Rt1) = val;
}
 
return true;
@@ -315,9 +316,11 @@ static bool access_cntp_ctl(struct kvm_vcpu *vcpu,
 
if (p->is_write) {
val = *vcpu_reg(vcpu, p->Rt1);
-   kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CTL, val);
+   kvm_arm_timer_write_sysreg(vcpu,
+  TIMER_PTIMER, TIMER_REG_CTL, val);
} else {
-   val = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CTL);
+   val = kvm_arm_timer_read_sysreg(vcpu,
+   TIMER_PTIMER, TIMER_REG_CTL);
*vcpu_reg(vcpu, p->Rt1) = val;
}
 
@@ -333,9 +336,11 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
if (p->is_write) {
val = (u64)*vcpu_reg(vcpu, p->Rt2) << 32;
val |= *vcpu_reg(vcpu, p->Rt1);
-   kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL, val);
+   kvm_arm_timer_write_sysreg(vcpu,
+  TIMER_PTIMER, TIMER_REG_CVAL, val);
} else {
-   val = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
+   val = kvm_arm_timer_read_sysreg(vcpu,
+   TIMER_PTIMER, TIMER_REG_CVAL);
*vcpu_reg(vcpu, p->Rt1) = val;
*vcpu_reg(vcpu, p->Rt2) = val >> 32;
}
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 3e5650903d6d..6482e8bcf1b8 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -392,6 +392,10 @@
 #define SYS_CNTP_CTL_EL0   sys_reg(3, 3, 14, 2, 1)
 #define SYS_CNTP_CVAL_EL0  sys_reg(3, 3, 14, 2, 2)
 
+#define SYS_AARCH32_CNTP_TVAL  sys_reg(0, 0, 14, 2, 0)
+#define SYS_AARCH32_CNTP_CTL   sys_reg(0, 0, 14, 2, 1)
+#define SYS_AARCH32_CNTP_CVAL  sys_reg(0, 2, 0, 14, 0)
+
 #define __PMEV_op2(n)  ((n) & 0x7)
 #define __CNTR_CRm(n)  (0x8 | (((n) >> 3) & 0x3))
 #define SYS_PMEVCNTRn_EL0(n)   sys_reg(3, 3, 14, __CNTR_CRm(n), 
__PMEV_op2(n))
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1a5bea4285e4..65ea63366c67 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -990,44 +990,51 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, 
struct sys_reg_params *p,
{ SYS_DESC(SYS_PMEVTYPERn_EL0(n)),  
\
  access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), }
 
-static bool access_cntp_tval(struct kvm_vcpu *vcpu,
-   struct sys_reg_params *p,
-   const struct sys_reg_desc *r)
+static bool access_arch_timer(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
 {
-   u64 now = kvm_phys_timer_read(

[PATCH 07/14] KVM: arm/arm64: Simplify bg_timer programming

2019-01-24 Thread Christoffer Dall
Instead of calling into kvm_timer_[un]schedule from the main kvm
blocking path, test if the VCPU is on the wait queue from the load/put
path and perform the background timer setup/cancel in this path.

This has the distinct advantage that we no longer race between load/put
and schedule/unschedule and programming and canceling of the bg_timer
always happens when the timer state is not loaded.

Note that we must now remove the checks in kvm_timer_blocking that do
not schedule a background timer if one of the timers can fire, because
we no longer have a guarantee that kvm_vcpu_check_block() will be called
before kvm_timer_blocking.

Reported-by: Andre Przywara 
Signed-off-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
---
 include/kvm/arm_arch_timer.h |  3 ---
 virt/kvm/arm/arch_timer.c| 35 ++-
 virt/kvm/arm/arm.c   |  2 --
 3 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 33771352dcd6..d6e6a45d1d24 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -76,9 +76,6 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct 
kvm_device_attr *attr);
 
 bool kvm_timer_is_pending(struct kvm_vcpu *vcpu);
 
-void kvm_timer_schedule(struct kvm_vcpu *vcpu);
-void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
-
 u64 kvm_phys_timer_read(void);
 
 void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index b07ac4614e1c..4986028d9829 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -349,22 +349,12 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
  * thread is removed from its waitqueue and made runnable when there's a timer
  * interrupt to handle.
  */
-void kvm_timer_schedule(struct kvm_vcpu *vcpu)
+static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
 {
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-   vtimer_save_state(vcpu);
-
-   /*
-* No need to schedule a background timer if any guest timer has
-* already expired, because kvm_vcpu_block will return before putting
-* the thread to sleep.
-*/
-   if (kvm_timer_should_fire(vtimer) || kvm_timer_should_fire(ptimer))
-   return;
-
/*
 * If both timers are not capable of raising interrupts (disabled or
 * masked), then there's no more work for us to do.
@@ -373,12 +363,19 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
return;
 
/*
-* The guest timers have not yet expired, schedule a background timer.
+* At least one guest time will expire. Schedule a background timer.
 * Set the earliest expiration time among the guest timers.
 */
soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
 }
 
+static void kvm_timer_unblocking(struct kvm_vcpu *vcpu)
+{
+   struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+   soft_timer_cancel(&timer->bg_timer);
+}
+
 static void vtimer_restore_state(struct kvm_vcpu *vcpu)
 {
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
@@ -401,15 +398,6 @@ static void vtimer_restore_state(struct kvm_vcpu *vcpu)
local_irq_restore(flags);
 }
 
-void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
-{
-   struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-
-   vtimer_restore_state(vcpu);
-
-   soft_timer_cancel(&timer->bg_timer);
-}
-
 static void set_cntvoff(u64 cntvoff)
 {
u32 low = lower_32_bits(cntvoff);
@@ -485,6 +473,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
/* Set the background timer for the physical timer emulation. */
phys_timer_emulate(vcpu);
 
+   kvm_timer_unblocking(vcpu);
+
/* If the timer fired while we weren't running, inject it now */
if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
@@ -527,6 +517,9 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 */
soft_timer_cancel(&timer->phys_timer);
 
+   if (swait_active(kvm_arch_vcpu_wq(vcpu)))
+   kvm_timer_blocking(vcpu);
+
/*
 * The kernel may decide to run userspace after calling vcpu_put, so
 * we reset cntvoff to 0 to ensure a consistent read between user
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index b77db673bb03..9fbdb9e1c51f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -335,13 +335,11 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
-   kvm_timer_schedule(vcpu);
kvm_vgic_v4_enable_doorbell(vcpu);
 }
 
 void kvm_arc

[PATCH 09/14] KVM: arm64: Reuse sys_reg() macro when searching the trap table

2019-01-24 Thread Christoffer Dall
From: Marc Zyngier 

Instead of having an open-coded macro, reuse the sys_reg() macro
that does the exact same thing.

Signed-off-by: Marc Zyngier 
Acked-by: Christoffer Dall 
---
 arch/arm64/kvm/sys_regs.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e3e37228ae4e..1a5bea4285e4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -965,6 +965,10 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
return true;
 }
 
+#define reg_to_encoding(x) \
+   sys_reg((u32)(x)->Op0, (u32)(x)->Op1,   \
+   (u32)(x)->CRn, (u32)(x)->CRm, (u32)(x)->Op2);
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
{ SYS_DESC(SYS_DBGBVRn_EL1(n)), \
@@ -1820,30 +1824,19 @@ static const struct sys_reg_desc 
*get_target_table(unsigned target,
}
 }
 
-#define reg_to_match_value(x)  \
-   ({  \
-   unsigned long val;  \
-   val  = (x)->Op0 << 14;  \
-   val |= (x)->Op1 << 11;  \
-   val |= (x)->CRn << 7;   \
-   val |= (x)->CRm << 3;   \
-   val |= (x)->Op2;\
-   val;\
-})
-
 static int match_sys_reg(const void *key, const void *elt)
 {
const unsigned long pval = (unsigned long)key;
const struct sys_reg_desc *r = elt;
 
-   return pval - reg_to_match_value(r);
+   return pval - reg_to_encoding(r);
 }
 
 static const struct sys_reg_desc *find_reg(const struct sys_reg_params *params,
 const struct sys_reg_desc table[],
 unsigned int num)
 {
-   unsigned long pval = reg_to_match_value(params);
+   unsigned long pval = reg_to_encoding(params);
 
return bsearch((void *)pval, table, num, sizeof(table[0]), 
match_sys_reg);
 }
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 04/14] ARM: KVM: Teach some form of type-safety to kvm_call_hyp

2019-01-24 Thread Christoffer Dall
From: Marc Zyngier 

Just like on arm64, and for the same reasons, kvm_call_hyp removes
any form of type safety when calling into HYP. But we can still
try to tell the compiler what we're trying to achieve.

Here, we can add code that would do the function call if it wasn't
guarded by an always-false predicate. Hopefully, the compiler is
dumb enough to do the type checking and clever enough to not emit
the corresponding code...

Signed-off-by: Marc Zyngier 
Acked-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h | 31 ---
 arch/arm/kvm/hyp/hyp-entry.S|  2 +-
 arch/arm/kvm/interrupts.S   |  4 ++--
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 023c9f2b1eea..4b6193f2f0f6 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -215,8 +215,33 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 
__user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 
-unsigned long kvm_call_hyp(void *hypfn, ...);
-#define kvm_call_hyp_ret(f, ...) kvm_call_hyp(f, ##__VA_ARGS__)
+unsigned long __kvm_call_hyp(void *hypfn, ...);
+
+/*
+ * The has_vhe() part doesn't get emitted, but is used for type-checking.
+ */
+#define kvm_call_hyp(f, ...)   \
+   do {\
+   if (has_vhe()) {\
+   f(__VA_ARGS__); \
+   } else {\
+   __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__); \
+   }   \
+   } while(0)
+
+#define kvm_call_hyp_ret(f, ...)   \
+   ({  \
+   typeof(f(__VA_ARGS__)) ret; \
+   \
+   if (has_vhe()) {\
+   ret = f(__VA_ARGS__);   \
+   } else {\
+   ret = __kvm_call_hyp(kvm_ksym_ref(f),   \
+##__VA_ARGS__);\
+   }   \
+   \
+   ret;\
+   })
 
 void force_vm_exit(const cpumask_t *mask);
 int __kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
@@ -268,7 +293,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 * compliant with the PCS!).
 */
 
-   kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
+   __kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
 
 static inline void __cpu_init_stage2(void)
diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
index aa3f9a9837ac..6ed3cf23fe89 100644
--- a/arch/arm/kvm/hyp/hyp-entry.S
+++ b/arch/arm/kvm/hyp/hyp-entry.S
@@ -176,7 +176,7 @@ THUMB(  orr lr, lr, #PSR_T_BIT  )
msr spsr_cxsf, lr
ldr lr, =panic
msr ELR_hyp, lr
-   ldr lr, =kvm_call_hyp
+   ldr lr, =__kvm_call_hyp
clrex
eret
 ENDPROC(__hyp_do_panic)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 80a1d6cd261c..a08e6419ebe9 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -42,7 +42,7 @@
  *   r12: caller save
  *   rest:callee save
  */
-ENTRY(kvm_call_hyp)
+ENTRY(__kvm_call_hyp)
hvc #0
bx  lr
-ENDPROC(kvm_call_hyp)
+ENDPROC(__kvm_call_hyp)
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 00/14] KVM: arm/arm64: Various rework in preparation of nested virt support

2019-01-24 Thread Christoffer Dall
This series contains a somewhat random set of reworks and improvements to the
KVM/Arm code in preparation for nested virtualization support.

We plan to merge these as early as v5.1.

The series relies on an additional patch which exposes the physical EL1 timer's
IRQ number to KVM:
  "clocksource/arm_arch_timer: Store physical timer IRQ number for KVM on VHE"
  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1908965.html


Andre Przywara (1):
  KVM: arm/arm64: consolidate arch timer trap handlers

Christoffer Dall (6):
  KVM: arm/arm64: Factor out VMID into struct kvm_vmid
  KVM: arm/arm64: Simplify bg_timer programming
  KVM: arm/arm64: timer: Rework data structures for multiple timers
  KVM: arm/arm64: arch_timer: Assign the phys timer on VHE systems
  KVM: arm/arm64: Rework the timer code to use a timer_map
  KVM: arm/arm64: Move kvm_is_write_fault to header file

Marc Zyngier (7):
  arm/arm64: KVM: Introduce kvm_call_hyp_ret()
  arm64: KVM: Allow for direct call of HYP functions when using VHE
  arm64: KVM: Drop VHE-specific HYP call stub
  ARM: KVM: Teach some form of type-safety to kvm_call_hyp
  arm/arm64: KVM: Statically configure the host's view of MPIDR
  KVM: arm64: Fix ICH_ELRSR_EL2 sysreg naming
  KVM: arm64: Reuse sys_reg() macro when searching the trap table

 arch/arm/include/asm/arch_gicv3.h|   4 +-
 arch/arm/include/asm/kvm_emulate.h   |   8 +
 arch/arm/include/asm/kvm_host.h  |  53 ++-
 arch/arm/include/asm/kvm_hyp.h   |   4 +
 arch/arm/include/asm/kvm_mmu.h   |  11 +
 arch/arm/kvm/coproc.c|  23 +-
 arch/arm/kvm/hyp/cp15-sr.c   |   1 -
 arch/arm/kvm/hyp/hyp-entry.S |   2 +-
 arch/arm/kvm/hyp/switch.c|   2 +-
 arch/arm/kvm/hyp/tlb.c   |   4 +-
 arch/arm/kvm/interrupts.S|   4 +-
 arch/arm64/include/asm/kvm_emulate.h |   8 +
 arch/arm64/include/asm/kvm_host.h|  48 ++-
 arch/arm64/include/asm/kvm_hyp.h |   3 +-
 arch/arm64/include/asm/kvm_mmu.h |  11 +
 arch/arm64/include/asm/sysreg.h  |   6 +-
 arch/arm64/kvm/debug.c   |   2 +-
 arch/arm64/kvm/hyp.S |   3 -
 arch/arm64/kvm/hyp/hyp-entry.S   |  12 -
 arch/arm64/kvm/hyp/sysreg-sr.c   |   1 -
 arch/arm64/kvm/sys_regs.c|  99 +++--
 include/kvm/arm_arch_timer.h |  68 +++-
 virt/kvm/arm/arch_timer.c| 583 +++
 virt/kvm/arm/arm.c   |  62 +--
 virt/kvm/arm/hyp/vgic-v3-sr.c|   2 +-
 virt/kvm/arm/mmu.c   |  10 +-
 virt/kvm/arm/trace.h | 105 +
 virt/kvm/arm/vgic/vgic-v3.c  |   4 +-
 28 files changed, 799 insertions(+), 344 deletions(-)

-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] clocksource/arm_arch_timer: Store physical timer IRQ number for KVM on VHE

2019-01-24 Thread Christoffer Dall
From: Andre Przywara 

A host running in VHE mode gets the EL2 physical timer as its time
source (accessed using the EL1 sysreg accessors, which get re-directed
to the EL2 sysregs by VHE).

The EL1 physical timer remains unused by the host kernel, allowing us to
pass that on directly to a KVM guest and saves us from emulating this
timer for the guest on VHE systems.

Store the EL1 Physical Timer's IRQ number in
struct arch_timer_kvm_info on VHE systems to allow KVM to use it.

Signed-off-by: Andre Przywara 
Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
Patches in preparation for nested virtualization on KVM/Arm depend on this
change, so we would like to merge this via the kvmarm tree or have a stable
branch including this patch.

Please let us know your preference.  Thanks.

 drivers/clocksource/arm_arch_timer.c | 11 +--
 include/clocksource/arm_arch_timer.h |  1 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index 9a7d4dc00b6e..b9243e2328b4 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -1206,6 +1206,13 @@ static enum arch_timer_ppi_nr __init 
arch_timer_select_ppi(void)
return ARCH_TIMER_PHYS_SECURE_PPI;
 }
 
+static void __init arch_timer_populate_kvm_info(void)
+{
+   arch_timer_kvm_info.virtual_irq = arch_timer_ppi[ARCH_TIMER_VIRT_PPI];
+   if (is_kernel_in_hyp_mode())
+   arch_timer_kvm_info.physical_irq = 
arch_timer_ppi[ARCH_TIMER_PHYS_NONSECURE_PPI];
+}
+
 static int __init arch_timer_of_init(struct device_node *np)
 {
int i, ret;
@@ -1220,7 +1227,7 @@ static int __init arch_timer_of_init(struct device_node 
*np)
for (i = ARCH_TIMER_PHYS_SECURE_PPI; i < ARCH_TIMER_MAX_TIMER_PPI; i++)
arch_timer_ppi[i] = irq_of_parse_and_map(np, i);
 
-   arch_timer_kvm_info.virtual_irq = arch_timer_ppi[ARCH_TIMER_VIRT_PPI];
+   arch_timer_populate_kvm_info();
 
rate = arch_timer_get_cntfrq();
arch_timer_of_configure_rate(rate, np);
@@ -1550,7 +1557,7 @@ static int __init arch_timer_acpi_init(struct 
acpi_table_header *table)
arch_timer_ppi[ARCH_TIMER_HYP_PPI] =
acpi_gtdt_map_ppi(ARCH_TIMER_HYP_PPI);
 
-   arch_timer_kvm_info.virtual_irq = arch_timer_ppi[ARCH_TIMER_VIRT_PPI];
+   arch_timer_populate_kvm_info();
 
/*
 * When probing via ACPI, we have no mechanism to override the sysreg
diff --git a/include/clocksource/arm_arch_timer.h 
b/include/clocksource/arm_arch_timer.h
index 349e5957c949..702967d996bb 100644
--- a/include/clocksource/arm_arch_timer.h
+++ b/include/clocksource/arm_arch_timer.h
@@ -74,6 +74,7 @@ enum arch_timer_spi_nr {
 struct arch_timer_kvm_info {
struct timecounter timecounter;
int virtual_irq;
+   int physical_irq;
 };
 
 struct arch_timer_mem_frame {
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: vgic: Always initialize the group of private IRQs

2019-01-10 Thread Christoffer Dall
We currently initialize the group of private IRQs during
kvm_vgic_vcpu_init, and the value of the group depends on the GIC model
we are emulating.  However, CPUs created before creating (and
initializing) the VGIC might end up with the wrong group if the VGIC
is created as GICv3 later.

Since we have no enforced ordering of creating the VGIC and creating
VCPUs, we can end up with part the VCPUs being properly intialized and
the remaining incorrectly initialized.  That also means that we have no
single place to do the per-cpu data structure initialization which
depends on knowing the emulated GIC model (which is only the group
field).

This patch removes the incorrect comment from kvm_vgic_vcpu_init and
initializes the group of all previously created VCPUs's private
interrupts in vgic_init in addition to the existing initialization in
kvm_vgic_vcpu_init.

Signed-off-by: Christoffer Dall 
---
I tested this by modifying kvmtool to create the vgic in the middle of creating
the VCPUs, and looking in /sys/kernel/debug/kvm//vgic-state showed the
first VCPU with private interrupts with group 0 for gicv3 and the secondary
VCPU with group 1 prior to this patch, and both VCPUs with group 1 following
this patch for GICv3 and both with group 0 with GICv2.

diff --git a/arm/kvm.c b/arm/kvm.c
index b824f63..c6c5fbc 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -82,10 +82,6 @@ void kvm__arch_init(struct kvm *kvm, const char 
*hugetlbfs_path, u64 ram_size)
 
madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
MADV_HUGEPAGE);
-
-   /* Create the virtual GIC. */
-   if (gic__create(kvm, kvm->cfg.arch.irqchip))
-   die("Failed to create virtual GIC");
 }
 
 #define FDT_ALIGN  SZ_2M
diff --git a/kvm-cpu.c b/kvm-cpu.c
index cc8385f..7a2fde0 100644
--- a/kvm-cpu.c
+++ b/kvm-cpu.c
@@ -253,6 +253,7 @@ panic_kvm:
 int kvm_cpu__init(struct kvm *kvm)
 {
int max_cpus, recommended_cpus, i;
+   bool gic_created = false;
 
max_cpus = kvm__max_cpus(kvm);
recommended_cpus = kvm__recommended_cpus(kvm);
@@ -281,6 +282,12 @@ int kvm_cpu__init(struct kvm *kvm)
}
 
for (i = 0; i < kvm->nrcpus; i++) {
+   if (i == 1) {
+   /* Create the virtual GIC. */
+   if (gic__create(kvm, kvm->cfg.arch.irqchip))
+   die("Failed to create virtual GIC");
+   gic_created = true;
+   }
kvm->cpus[i] = kvm_cpu__arch_init(kvm, i);
if (!kvm->cpus[i]) {
pr_warning("unable to initialize KVM VCPU");
@@ -288,6 +295,10 @@ int kvm_cpu__init(struct kvm *kvm)
}
}
 
+   /* Create the virtual GIC. */
+   if (!gic_created && gic__create(kvm, kvm->cfg.arch.irqchip))
+   die("Failed to create virtual GIC");
+
return 0;
 
 fail_alloc:

 virt/kvm/arm/vgic/vgic-init.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index c0c0b88af1d5..f935adc50626 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -231,13 +231,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
irq->config = VGIC_CONFIG_LEVEL;
}
 
-   /*
-* GICv3 can only be created via the KVM_DEVICE_CREATE API and
-* so we always know the emulation type at this point as it's
-* either explicitly configured as GICv3, or explicitly
-* configured as GICv2, or not configured yet which also
-* implies GICv2.
-*/
if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
irq->group = 1;
else
@@ -298,6 +291,19 @@ int vgic_init(struct kvm *kvm)
if (ret)
goto out;
 
+   /* Initialize groups on CPUs created before the VGIC type was known */
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+   for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
+   struct vgic_irq *irq = &vgic_cpu->private_irqs[i];
+   if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+   irq->group = 1;
+   else
+   irq->group = 0;
+   }
+   }
+
if (vgic_has_its(kvm)) {
ret = vgic_v4_init(kvm);
if (ret)
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvmarm:kvm-arm64/nv-wip-v5.0-rc1 4/75] arch/arm/kvm/../../../virt/kvm/arm/arch_timer.c:700:7: error: 'SYS_CNTP_TVAL_EL0' undeclared

2019-01-09 Thread Christoffer Dall
On Wed, Jan 09, 2019 at 10:09:51AM +, Marc Zyngier wrote:
> On 09/01/2019 09:13, André Przywara wrote:
> > On 09/01/2019 04:40, kbuild test robot wrote:
> > 
> > Marc, Christoffer,
> > 
> >> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> >> kvm-arm64/nv-wip-v5.0-rc1
> >> head:   688c386ca096f2c1f2eee386697586c88df5d5bc
> >> commit: 2b1265c58a873d917e99ac762e243c1274481dbf [4/75] KVM: arm/arm64: 
> >> consolidate arch timer trap handlers
> >> config: arm-axm55xx_defconfig (attached as .config)
> >> compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
> >> reproduce:
> >> wget 
> >> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
> >> -O ~/bin/make.cross
> >> chmod +x ~/bin/make.cross
> >> git checkout 2b1265c58a873d917e99ac762e243c1274481dbf
> >> # save the attached .config to linux build tree
> >> GCC_VERSION=7.2.0 make.cross ARCH=arm 
> >>
> >> All errors (new ones prefixed by >>):
> > 
> > I was looking at this yesterday: It's a bit nasty, don't know a good
> > solution beside bringing back this part of my original timer rework series.
> > The problem is that those symbols contains the Aarch64 specific
> > (instruction) encoding of the timer registers, plus we need the AArch32
> > encodings for 32-on-64 guests.
> 
> Why? There is exactly one timer that needs trapping for AArch32 (the EL1
> physical timer). All we need is:
> 
> - the SYS_AARCH32_CNTP_* encodings on 32bit
> - some CPP magic to prevent the compilation from breaking
> 
> > 
> > That's why I used the generic UAPI encoding for the registers, because
> > we only need *some* identification for them, it doesn't need to be
> > something defined by the architecture.
> 
> I disagree. By doing so, you're conflating userspace access and
> trapping, which has proved to be a bad idea in the past. For example,
> you'd end-up having both CVAL and TVAL in UAPI, which is not something
> I'm keen to have. On the other hand, the trapping function do need to be
> able to handle these.
> 
I think it probably makes sense to have the sysreg encoding stuff in the
arch-specific files, and have an indirection in sys_regs.c and coproc.c
which 'translates' from a system register to some generic arch timer
define identifying a 'timer'.

I think the major breakage in the previous design was to use the same
*functions* for uaccess and for sysregs traps, but I don't think it was
necessarily a problem to use the same definitions to identify a timer.

That of couse leaves the problem of how to identify a register within a
timer.  Could we do something as braindead as:


diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ce441dda412c..1d7bd452e5fd 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -32,6 +32,13 @@ enum kvm_arch_timers {
NR_KVM_TIMERS
 };
 
+enum arch_timer_reg {
+   ARCH_TIMER_REG_CTL,
+   ARCH_TIMER_REG_CVAL,
+   ARCH_TIMER_REG_TVAL,
+   ARCH_TIMER_REG_CNT
+}
+
 struct arch_timer_context {
struct kvm_vcpu *vcpu;
 
@@ -86,8 +93,13 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_timer_update_run(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu);
 
-u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
-int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
+u64 kvm_arm_timer_get_reg(struct kvm_vcpu *,
+ enum kvm_arch_timers timer,
+ enum arch_timer_reg reg);
+int kvm_arm_timer_set_reg(struct kvm_vcpu *,
+ enum kvm_arch_timers timer,
+ enum arch_timer_reg reg,
+ u64 value);
 
 u64 kvm_arm_timer_read_sysreg(struct kvm_vcpu *vcpu, u32 sr);
 void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu, u32 sr, u64 val);


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2019-01-08 Thread Christoffer Dall
On Tue, Jan 08, 2019 at 12:12:13PM +, Marc Zyngier wrote:
> On 08/01/2019 12:03, Christoffer Dall wrote:
> > On Tue, Jan 08, 2019 at 11:50:59AM +, Marc Zyngier wrote:
> >> On Tue, 08 Jan 2019 11:25:13 +,
> >> Andrew Murray  wrote:
> >>
> >> Hi Andrew,
> >>
> >>> My only doubt about this is as follows. If, on a KVM host you run this:
> >>>
> >>> perf stat -e cycles:H lkvm run ...
> >>>
> >>> then on the VHE host the cycles reported represents the entire non-guest 
> >>> cycles
> >>> associated with running the guest.
> >>>
> >>> On a !VHE, the cycles reported exclude EL2 (with or without this patch) 
> >>> and
> >>> thus you don't get a representation of all the non-guest cycles 
> >>> associated with
> >>> the guest. However without this patch you could at least still run:
> >>>
> >>> perf stat -e cycles:H -e cycles:h lkvm run ...
> >>>
> >>> and then add the two cycle counts together to get something comparative 
> >>> with
> >>> the VHE host.
> >>>
> >>> If the above patch represents the desired semantics, then perhaps we must 
> >>> count
> >>> both EL1 and *EL2* for !exclude_kernel on !VHE. In fact I think we should 
> >>> do
> >>> this anyway and remove a little complexity from armv8pmu_set_event_filter.
> >>> Thoughts?
> >>
> >> I'm not sure we should hide the architectural differences between VHE
> >> and !VHE. If you're trying to measure what is happening at in the
> >> hypervisor, you can't reason about it while ignoring the dual nature
> >> of !VHE.
> >>
> > 
> > How do you define hypervisor here?  Is that just the code that runs at
> > EL2 or also parts of KVM that runs at EL1?
> 
> I define it as "not a guest". Whatever is used to support a guest is the
> hypervisor.
> 
> > It remains unclear to me why you'd want to measure a subset of KVM,
> > which happens to run in EL2, in your host (and hypervisor-enabled)
> > kernel, and you are even precluded from measuring a comparable portion
> > of your implementation on other Arm systems (VHE).
> 
> Because I'm not trying to compare apples (VHE) and oranges (!VHE). My
> use-case for perf is to measure the impact of a change on a given
> implementation, and the more I can narrow the impact of that change, the
> better (specially when !VHE precludes the use of other techniques such
> as sampling).
> 

Fair enough.  I don't know if that's the only use case for perf we
should consider though.

> > Admittedly, I'm not at export in using perf, but I find this EL1/EL2
> > distinction out of place as it relates to exlude_kernel, exlude_user,
> > and exlude_hv.  Will we have a fourth Arm-specific flag which takes the
> > place of exclude_hv on PowerPC, which excludes an underlying hypervisor
> > when running a guest, should we ever support counting that in the
> > future?
> In all honestly, exclude_hv doesn't make much sense to me on a VHE
> system, unless you define an arbitrary cutting point where things are on
> one side or the other. As for a fourth flag, I have no idea.
> 

I think this all boils down to how these flags are interpreted and
represented to a user via tooling.  If these flags must be considered
in complete isolation on a particular system and architecture, then
fine, we can define it as whatever we want, giving us a little bit more
insight on where things happen on a !VHE system.

If we care about these flags representing similar semantics to other
architectures, then I contend that we are abusing the exclude_hv flag
today, and exclude_hv should only ever have an effect when set within
a guest, not in a host.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2019-01-08 Thread Christoffer Dall
On Tue, Jan 08, 2019 at 11:25:13AM +, Andrew Murray wrote:
> On Tue, Jan 08, 2019 at 11:18:43AM +0100, Christoffer Dall wrote:
> > On Fri, Jan 04, 2019 at 03:32:06PM +, Will Deacon wrote:
> > > On Tue, Dec 18, 2018 at 01:02:26PM +0100, Christoffer Dall wrote:
> > > > On Wed, Dec 12, 2018 at 10:29:32AM +, Andrew Murray wrote:
> > > > > Add support for the :G and :H attributes in perf by handling the
> > > > > exclude_host/exclude_guest event attributes.
> > > > > 
> > > > > We notify KVM of counters that we wish to be enabled or disabled on
> > > > > guest entry/exit and thus defer from starting or stopping :G events
> > > > > as per the events exclude_host attribute.
> > > > > 
> > > > > With both VHE and non-VHE we switch the counters between host/guest
> > > > > at EL2. We are able to eliminate counters counting host events on
> > > > > the boundaries of guest entry/exit when using :G by filtering out
> > > > > EL2 for exclude_host. However when using :H unless exclude_hv is set
> > > > > on non-VHE then there is a small blackout window at the guest
> > > > > entry/exit where host events are not captured.
> > > > > 
> > > > > Signed-off-by: Andrew Murray 
> > > > > ---
> > > > >  arch/arm64/kernel/perf_event.c | 51 
> > > > > --
> > > > >  1 file changed, 44 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm64/kernel/perf_event.c 
> > > > > b/arch/arm64/kernel/perf_event.c
> > > > > index de564ae..4a3c73d 100644
> > > > > --- a/arch/arm64/kernel/perf_event.c
> > > > > +++ b/arch/arm64/kernel/perf_event.c
> > > > > @@ -26,6 +26,7 @@
> > > > >  
> > > > >  #include 
> > > > >  #include 
> > > > > +#include 
> > > > >  #include 
> > > > >  #include 
> > > > >  #include 
> > > > > @@ -647,11 +648,26 @@ static inline int armv8pmu_enable_counter(int 
> > > > > idx)
> > > > >  
> > > > >  static inline void armv8pmu_enable_event_counter(struct perf_event 
> > > > > *event)
> > > > >  {
> > > > > + struct perf_event_attr *attr = &event->attr;
> > > > >   int idx = event->hw.idx;
> > > > > + int flags = 0;
> > > > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > > > >  
> > > > > - armv8pmu_enable_counter(idx);
> > > > >   if (armv8pmu_event_is_chained(event))
> > > > > - armv8pmu_enable_counter(idx - 1);
> > > > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > > > +
> > > > > + if (!attr->exclude_host)
> > > > > + flags |= KVM_PMU_EVENTS_HOST;
> > > > > + if (!attr->exclude_guest)
> > > > > + flags |= KVM_PMU_EVENTS_GUEST;
> > > > > +
> > > > > + kvm_set_pmu_events(counter_bits, flags);
> > > > > +
> > > > > + if (!attr->exclude_host) {
> > > > > + armv8pmu_enable_counter(idx);
> > > > > + if (armv8pmu_event_is_chained(event))
> > > > > + armv8pmu_enable_counter(idx - 1);
> > > > > + }
> > > > >  }
> > > > >  
> > > > >  static inline int armv8pmu_disable_counter(int idx)
> > > > > @@ -664,11 +680,20 @@ static inline int armv8pmu_disable_counter(int 
> > > > > idx)
> > > > >  static inline void armv8pmu_disable_event_counter(struct perf_event 
> > > > > *event)
> > > > >  {
> > > > >   struct hw_perf_event *hwc = &event->hw;
> > > > > + struct perf_event_attr *attr = &event->attr;
> > > > >   int idx = hwc->idx;
> > > > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > > > >  
> > > > >   if (armv8pmu_event_is_chained(event))
> > > > > - armv8pmu_disable_counter(idx - 1);
> > > > > - armv8pmu_disable_counter(idx);
> > > > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > > > 

Re: [PATCH v8 4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2019-01-08 Thread Christoffer Dall
On Tue, Jan 08, 2019 at 11:50:59AM +, Marc Zyngier wrote:
> On Tue, 08 Jan 2019 11:25:13 +,
> Andrew Murray  wrote:
> 
> Hi Andrew,
> 
> > My only doubt about this is as follows. If, on a KVM host you run this:
> > 
> > perf stat -e cycles:H lkvm run ...
> > 
> > then on the VHE host the cycles reported represents the entire non-guest 
> > cycles
> > associated with running the guest.
> > 
> > On a !VHE, the cycles reported exclude EL2 (with or without this patch) and
> > thus you don't get a representation of all the non-guest cycles associated 
> > with
> > the guest. However without this patch you could at least still run:
> > 
> > perf stat -e cycles:H -e cycles:h lkvm run ...
> > 
> > and then add the two cycle counts together to get something comparative with
> > the VHE host.
> > 
> > If the above patch represents the desired semantics, then perhaps we must 
> > count
> > both EL1 and *EL2* for !exclude_kernel on !VHE. In fact I think we should do
> > this anyway and remove a little complexity from armv8pmu_set_event_filter.
> > Thoughts?
> 
> I'm not sure we should hide the architectural differences between VHE
> and !VHE. If you're trying to measure what is happening at in the
> hypervisor, you can't reason about it while ignoring the dual nature
> of !VHE.
> 

How do you define hypervisor here?  Is that just the code that runs at
EL2 or also parts of KVM that runs at EL1?

It remains unclear to me why you'd want to measure a subset of KVM,
which happens to run in EL2, in your host (and hypervisor-enabled)
kernel, and you are even precluded from measuring a comparable portion
of your implementation on other Arm systems (VHE).

Admittedly, I'm not at export in using perf, but I find this EL1/EL2
distinction out of place as it relates to exlude_kernel, exlude_user,
and exlude_hv.  Will we have a fourth Arm-specific flag which takes the
place of exclude_hv on PowerPC, which excludes an underlying hypervisor
when running a guest, should we ever support counting that in the
future?


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 00/12] arm64: Paravirtualized time support

2019-01-08 Thread Christoffer Dall
On Mon, Dec 10, 2018 at 11:40:47AM +, Mark Rutland wrote:
> On Wed, Nov 28, 2018 at 02:45:15PM +, Steven Price wrote:
> > This series add support for paravirtualized time for Arm64 guests and
> > KVM hosts following the specification in Arm's document DEN 0057A:
> > 
> > https://developer.arm.com/docs/den0057/a
> > 
> > It implements support for Live Physical Time (LPT) which provides the
> > guest with a method to derive a stable counter of time during which the
> > guest is executing even when the guest is being migrated between hosts
> > with different physical counter frequencies.
> > 
> > It also implements support for stolen time, allowing the guest to
> > identify time when it is forcibly not executing.
> 
> I know that stolen time reporting is important, and I think that we
> definitely want to pick up that part of the spec (once it is published
> in some non-draft form).
> 
> However, I am very concerned with the pv-freq part of LPT, and I'd like
> to avoid that if at all possible. I say that because:
> 
> * By design, it breaks architectural guarantees from the PoV of SW in
>   the guest.
> 
>   A VM may host multiple SW agents serially (e.g. when booting, or
>   across kexec), or concurrently (e.g. Linux w/ EFI runtime services),
>   and the host has no way to tell whether all software in the guest will
>   function correctly. Due to this, it's not possible to have a guest
>   opt-in to the architecturally-broken timekeeping.

Is this necessarily true?

As I understood the intention of the spec, there would be no change to
behavior of the timers as exposed by the hypervisor unless a software
agent specifically ops-int to LPT and pv-freq.

In a scenario with Linux and UEFI running, they must clearly agree on
using functionality that changes the underlying behavior.  For
kdump/kexec scenarios, the OS would have to tear down the functionality
to work across migration after loading a secondary SW agent, which
probably needs adding to the spec.

> 
>   Existing guests will not work correctly once pv-freq is in use, and if
>   configured without pv-freq (or if the guest fails to discover pv-freq
>   for any reason), the administrator may encounter anything between
>   subtle breakage or fatally incorrect timekeeping.
> 
>   There's plenty of SW agents other than Linux which runs in a guest,
>   which would need to be updated to handle pv-freq, e.g. GRUB, *BSD,
>   iPXE.
> 
>   Given this, I think that this is going to lead to subtle breakage in
>   real-world scenarios. 

I think we'd definitely need to limit the exposure of pv-freq to Linux
and (if necessary) UEFI runtime services.  Do you see scenarios where
this would not be possible?


[...]

> 
> I understand that LPT is supposed to account for time lost during the
> migration. Can we account for this without pv-freq? e.g. is it possible
> to account for this in the same way as stolen time?
> 

I think we can indeed account for lost time during migration or host
system suspend by simply adjusting CNTVOFF_EL2 (as Steve points out, KVM
already supports this, but QEMU doesn't make use of that today -- there
were some patches attempting to address that recently).


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2019-01-08 Thread Christoffer Dall
On Fri, Jan 04, 2019 at 03:32:06PM +, Will Deacon wrote:
> On Tue, Dec 18, 2018 at 01:02:26PM +0100, Christoffer Dall wrote:
> > On Wed, Dec 12, 2018 at 10:29:32AM +, Andrew Murray wrote:
> > > Add support for the :G and :H attributes in perf by handling the
> > > exclude_host/exclude_guest event attributes.
> > > 
> > > We notify KVM of counters that we wish to be enabled or disabled on
> > > guest entry/exit and thus defer from starting or stopping :G events
> > > as per the events exclude_host attribute.
> > > 
> > > With both VHE and non-VHE we switch the counters between host/guest
> > > at EL2. We are able to eliminate counters counting host events on
> > > the boundaries of guest entry/exit when using :G by filtering out
> > > EL2 for exclude_host. However when using :H unless exclude_hv is set
> > > on non-VHE then there is a small blackout window at the guest
> > > entry/exit where host events are not captured.
> > > 
> > > Signed-off-by: Andrew Murray 
> > > ---
> > >  arch/arm64/kernel/perf_event.c | 51 
> > > --
> > >  1 file changed, 44 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/perf_event.c 
> > > b/arch/arm64/kernel/perf_event.c
> > > index de564ae..4a3c73d 100644
> > > --- a/arch/arm64/kernel/perf_event.c
> > > +++ b/arch/arm64/kernel/perf_event.c
> > > @@ -26,6 +26,7 @@
> > >  
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -647,11 +648,26 @@ static inline int armv8pmu_enable_counter(int idx)
> > >  
> > >  static inline void armv8pmu_enable_event_counter(struct perf_event 
> > > *event)
> > >  {
> > > + struct perf_event_attr *attr = &event->attr;
> > >   int idx = event->hw.idx;
> > > + int flags = 0;
> > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > >  
> > > - armv8pmu_enable_counter(idx);
> > >   if (armv8pmu_event_is_chained(event))
> > > - armv8pmu_enable_counter(idx - 1);
> > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > +
> > > + if (!attr->exclude_host)
> > > + flags |= KVM_PMU_EVENTS_HOST;
> > > + if (!attr->exclude_guest)
> > > + flags |= KVM_PMU_EVENTS_GUEST;
> > > +
> > > + kvm_set_pmu_events(counter_bits, flags);
> > > +
> > > + if (!attr->exclude_host) {
> > > + armv8pmu_enable_counter(idx);
> > > + if (armv8pmu_event_is_chained(event))
> > > + armv8pmu_enable_counter(idx - 1);
> > > + }
> > >  }
> > >  
> > >  static inline int armv8pmu_disable_counter(int idx)
> > > @@ -664,11 +680,20 @@ static inline int armv8pmu_disable_counter(int idx)
> > >  static inline void armv8pmu_disable_event_counter(struct perf_event 
> > > *event)
> > >  {
> > >   struct hw_perf_event *hwc = &event->hw;
> > > + struct perf_event_attr *attr = &event->attr;
> > >   int idx = hwc->idx;
> > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > >  
> > >   if (armv8pmu_event_is_chained(event))
> > > - armv8pmu_disable_counter(idx - 1);
> > > - armv8pmu_disable_counter(idx);
> > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > +
> > > + kvm_clr_pmu_events(counter_bits);
> > > +
> > > + if (!attr->exclude_host) {
> > > + if (armv8pmu_event_is_chained(event))
> > > + armv8pmu_disable_counter(idx - 1);
> > > + armv8pmu_disable_counter(idx);
> > > + }
> > >  }
> > >  
> > >  static inline int armv8pmu_enable_intens(int idx)
> > > @@ -943,16 +968,25 @@ static int armv8pmu_set_event_filter(struct 
> > > hw_perf_event *event,
> > >* Therefore we ignore exclude_hv in this configuration, since
> > >* there's no hypervisor to sample anyway. This is consistent
> > >* with other architectures (x86 and Power).
> > > +  *
> > > +  * To eliminate counting host events on the boundaries of
> > > +  * guest entry/exit we ensure EL2 is not included in hyp mode
> > > +  * with !exclude_host.
> > >*/
> > >   if (is_kernel_in_hyp_mode()) {
> > > - if (!attr->exclud

Re: [PATCH v8 4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2018-12-18 Thread Christoffer Dall
On Tue, Dec 18, 2018 at 04:27:05PM +, Andrew Murray wrote:
> On Tue, Dec 18, 2018 at 03:38:33PM +0100, Christoffer Dall wrote:
> > On Tue, Dec 18, 2018 at 01:25:32PM +, Andrew Murray wrote:
> > > On Tue, Dec 18, 2018 at 01:02:26PM +0100, Christoffer Dall wrote:
> > > > On Wed, Dec 12, 2018 at 10:29:32AM +, Andrew Murray wrote:
> > > > > Add support for the :G and :H attributes in perf by handling the
> > > > > exclude_host/exclude_guest event attributes.
> > > > > 
> > > > > We notify KVM of counters that we wish to be enabled or disabled on
> > > > > guest entry/exit and thus defer from starting or stopping :G events
> > > > > as per the events exclude_host attribute.
> > > > > 
> > > > > With both VHE and non-VHE we switch the counters between host/guest
> > > > > at EL2. We are able to eliminate counters counting host events on
> > > > > the boundaries of guest entry/exit when using :G by filtering out
> > > > > EL2 for exclude_host. However when using :H unless exclude_hv is set
> > > > > on non-VHE then there is a small blackout window at the guest
> > > > > entry/exit where host events are not captured.
> > > > > 
> > > > > Signed-off-by: Andrew Murray 
> > > > > ---
> > > > >  arch/arm64/kernel/perf_event.c | 51 
> > > > > --
> > > > >  1 file changed, 44 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm64/kernel/perf_event.c 
> > > > > b/arch/arm64/kernel/perf_event.c
> > > > > index de564ae..4a3c73d 100644
> > > > > --- a/arch/arm64/kernel/perf_event.c
> > > > > +++ b/arch/arm64/kernel/perf_event.c
> > > > > @@ -26,6 +26,7 @@
> > > > >  
> > > > >  #include 
> > > > >  #include 
> > > > > +#include 
> > > > >  #include 
> > > > >  #include 
> > > > >  #include 
> > > > > @@ -647,11 +648,26 @@ static inline int armv8pmu_enable_counter(int 
> > > > > idx)
> > > > >  
> > > > >  static inline void armv8pmu_enable_event_counter(struct perf_event 
> > > > > *event)
> > > > >  {
> > > > > + struct perf_event_attr *attr = &event->attr;
> > > > >   int idx = event->hw.idx;
> > > > > + int flags = 0;
> > > > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > > > >  
> > > > > - armv8pmu_enable_counter(idx);
> > > > >   if (armv8pmu_event_is_chained(event))
> > > > > - armv8pmu_enable_counter(idx - 1);
> > > > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > > > +
> > > > > + if (!attr->exclude_host)
> > > > > + flags |= KVM_PMU_EVENTS_HOST;
> > > > > + if (!attr->exclude_guest)
> > > > > + flags |= KVM_PMU_EVENTS_GUEST;
> > > > > +
> > > > > + kvm_set_pmu_events(counter_bits, flags);
> > > > > +
> > > > > + if (!attr->exclude_host) {
> > > > > + armv8pmu_enable_counter(idx);
> > > > > + if (armv8pmu_event_is_chained(event))
> > > > > + armv8pmu_enable_counter(idx - 1);
> > > > > + }
> > > > >  }
> > > > >  
> > > > >  static inline int armv8pmu_disable_counter(int idx)
> > > > > @@ -664,11 +680,20 @@ static inline int armv8pmu_disable_counter(int 
> > > > > idx)
> > > > >  static inline void armv8pmu_disable_event_counter(struct perf_event 
> > > > > *event)
> > > > >  {
> > > > >   struct hw_perf_event *hwc = &event->hw;
> > > > > + struct perf_event_attr *attr = &event->attr;
> > > > >   int idx = hwc->idx;
> > > > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > > > >  
> > > > >   if (armv8pmu_event_is_chained(event))
> > > > > - armv8pmu_disable_counter(idx - 1);
> > > > > - armv8pmu_disable_counter(idx);
> > > > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > > > 

Re: [PATCH v8 4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2018-12-18 Thread Christoffer Dall
On Tue, Dec 18, 2018 at 01:25:32PM +, Andrew Murray wrote:
> On Tue, Dec 18, 2018 at 01:02:26PM +0100, Christoffer Dall wrote:
> > On Wed, Dec 12, 2018 at 10:29:32AM +, Andrew Murray wrote:
> > > Add support for the :G and :H attributes in perf by handling the
> > > exclude_host/exclude_guest event attributes.
> > > 
> > > We notify KVM of counters that we wish to be enabled or disabled on
> > > guest entry/exit and thus defer from starting or stopping :G events
> > > as per the events exclude_host attribute.
> > > 
> > > With both VHE and non-VHE we switch the counters between host/guest
> > > at EL2. We are able to eliminate counters counting host events on
> > > the boundaries of guest entry/exit when using :G by filtering out
> > > EL2 for exclude_host. However when using :H unless exclude_hv is set
> > > on non-VHE then there is a small blackout window at the guest
> > > entry/exit where host events are not captured.
> > > 
> > > Signed-off-by: Andrew Murray 
> > > ---
> > >  arch/arm64/kernel/perf_event.c | 51 
> > > --
> > >  1 file changed, 44 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/perf_event.c 
> > > b/arch/arm64/kernel/perf_event.c
> > > index de564ae..4a3c73d 100644
> > > --- a/arch/arm64/kernel/perf_event.c
> > > +++ b/arch/arm64/kernel/perf_event.c
> > > @@ -26,6 +26,7 @@
> > >  
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -647,11 +648,26 @@ static inline int armv8pmu_enable_counter(int idx)
> > >  
> > >  static inline void armv8pmu_enable_event_counter(struct perf_event 
> > > *event)
> > >  {
> > > + struct perf_event_attr *attr = &event->attr;
> > >   int idx = event->hw.idx;
> > > + int flags = 0;
> > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > >  
> > > - armv8pmu_enable_counter(idx);
> > >   if (armv8pmu_event_is_chained(event))
> > > - armv8pmu_enable_counter(idx - 1);
> > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > +
> > > + if (!attr->exclude_host)
> > > + flags |= KVM_PMU_EVENTS_HOST;
> > > + if (!attr->exclude_guest)
> > > + flags |= KVM_PMU_EVENTS_GUEST;
> > > +
> > > + kvm_set_pmu_events(counter_bits, flags);
> > > +
> > > + if (!attr->exclude_host) {
> > > + armv8pmu_enable_counter(idx);
> > > + if (armv8pmu_event_is_chained(event))
> > > + armv8pmu_enable_counter(idx - 1);
> > > + }
> > >  }
> > >  
> > >  static inline int armv8pmu_disable_counter(int idx)
> > > @@ -664,11 +680,20 @@ static inline int armv8pmu_disable_counter(int idx)
> > >  static inline void armv8pmu_disable_event_counter(struct perf_event 
> > > *event)
> > >  {
> > >   struct hw_perf_event *hwc = &event->hw;
> > > + struct perf_event_attr *attr = &event->attr;
> > >   int idx = hwc->idx;
> > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > >  
> > >   if (armv8pmu_event_is_chained(event))
> > > - armv8pmu_disable_counter(idx - 1);
> > > - armv8pmu_disable_counter(idx);
> > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > +
> > > + kvm_clr_pmu_events(counter_bits);
> > > +
> > > + if (!attr->exclude_host) {
> > > + if (armv8pmu_event_is_chained(event))
> > > + armv8pmu_disable_counter(idx - 1);
> > > + armv8pmu_disable_counter(idx);
> > > + }
> > >  }
> > >  
> > >  static inline int armv8pmu_enable_intens(int idx)
> > > @@ -943,16 +968,25 @@ static int armv8pmu_set_event_filter(struct 
> > > hw_perf_event *event,
> > >* Therefore we ignore exclude_hv in this configuration, since
> > >* there's no hypervisor to sample anyway. This is consistent
> > >* with other architectures (x86 and Power).
> > > +  *
> > > +  * To eliminate counting host events on the boundaries of
> > > +  * guest entry/exit we ensure EL2 is not included in hyp mode
> > > +  * with !exclude_host.
> > >*/
> > >   if (is_kernel_in_hyp_mode()) {
> > > - if (!attr->exclude_ke

Re: [PATCH v8 4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2018-12-18 Thread Christoffer Dall
On Wed, Dec 12, 2018 at 10:29:32AM +, Andrew Murray wrote:
> Add support for the :G and :H attributes in perf by handling the
> exclude_host/exclude_guest event attributes.
> 
> We notify KVM of counters that we wish to be enabled or disabled on
> guest entry/exit and thus defer from starting or stopping :G events
> as per the events exclude_host attribute.
> 
> With both VHE and non-VHE we switch the counters between host/guest
> at EL2. We are able to eliminate counters counting host events on
> the boundaries of guest entry/exit when using :G by filtering out
> EL2 for exclude_host. However when using :H unless exclude_hv is set
> on non-VHE then there is a small blackout window at the guest
> entry/exit where host events are not captured.
> 
> Signed-off-by: Andrew Murray 
> ---
>  arch/arm64/kernel/perf_event.c | 51 
> --
>  1 file changed, 44 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index de564ae..4a3c73d 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -26,6 +26,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -647,11 +648,26 @@ static inline int armv8pmu_enable_counter(int idx)
>  
>  static inline void armv8pmu_enable_event_counter(struct perf_event *event)
>  {
> + struct perf_event_attr *attr = &event->attr;
>   int idx = event->hw.idx;
> + int flags = 0;
> + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
>  
> - armv8pmu_enable_counter(idx);
>   if (armv8pmu_event_is_chained(event))
> - armv8pmu_enable_counter(idx - 1);
> + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> +
> + if (!attr->exclude_host)
> + flags |= KVM_PMU_EVENTS_HOST;
> + if (!attr->exclude_guest)
> + flags |= KVM_PMU_EVENTS_GUEST;
> +
> + kvm_set_pmu_events(counter_bits, flags);
> +
> + if (!attr->exclude_host) {
> + armv8pmu_enable_counter(idx);
> + if (armv8pmu_event_is_chained(event))
> + armv8pmu_enable_counter(idx - 1);
> + }
>  }
>  
>  static inline int armv8pmu_disable_counter(int idx)
> @@ -664,11 +680,20 @@ static inline int armv8pmu_disable_counter(int idx)
>  static inline void armv8pmu_disable_event_counter(struct perf_event *event)
>  {
>   struct hw_perf_event *hwc = &event->hw;
> + struct perf_event_attr *attr = &event->attr;
>   int idx = hwc->idx;
> + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
>  
>   if (armv8pmu_event_is_chained(event))
> - armv8pmu_disable_counter(idx - 1);
> - armv8pmu_disable_counter(idx);
> + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> +
> + kvm_clr_pmu_events(counter_bits);
> +
> + if (!attr->exclude_host) {
> + if (armv8pmu_event_is_chained(event))
> + armv8pmu_disable_counter(idx - 1);
> + armv8pmu_disable_counter(idx);
> + }
>  }
>  
>  static inline int armv8pmu_enable_intens(int idx)
> @@ -943,16 +968,25 @@ static int armv8pmu_set_event_filter(struct 
> hw_perf_event *event,
>* Therefore we ignore exclude_hv in this configuration, since
>* there's no hypervisor to sample anyway. This is consistent
>* with other architectures (x86 and Power).
> +  *
> +  * To eliminate counting host events on the boundaries of
> +  * guest entry/exit we ensure EL2 is not included in hyp mode
> +  * with !exclude_host.
>*/
>   if (is_kernel_in_hyp_mode()) {
> - if (!attr->exclude_kernel)
> + if (!attr->exclude_kernel && !attr->exclude_host)
>   config_base |= ARMV8_PMU_INCLUDE_EL2;
>   } else {
> - if (attr->exclude_kernel)
> - config_base |= ARMV8_PMU_EXCLUDE_EL1;
>   if (!attr->exclude_hv)
>   config_base |= ARMV8_PMU_INCLUDE_EL2;

I'm not sure about the current use of exclude_hv here.  The comment says
it's consistent with other architectures, but I can't find an example to
confirm this, and I don't think we have a comparable thing to the split
of the hypervisor between EL1 and EL2 we have on non-VHE.

Joerg told me the semantics were designed to be:

exclude_hv: When running as a guest, stop counting events when
the HV runs.

exclude_host: When Linux runs as a HV itself, only count events
  while a guest is running.

exclude_guest: When Linux runs as a HV, only count events when
   running in host mode.

(But tools/perf/design.txt does not really confirm this).

On arm64 that would mean:

exclude_hv: As a host, no effect.
As a guest, set the counter to include EL2 for a
hypervisor to emulate.

exclude_host: As a guest, has no effect.
   

Re: [PATCH v8 2/5] arm64: KVM: encapsulate kvm_cpu_context in kvm_host_data

2018-12-18 Thread Christoffer Dall
On Wed, Dec 12, 2018 at 10:29:30AM +, Andrew Murray wrote:
> The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
> a typedef to kvm_cpu_context and is used to store host cpu context. The
> kvm_cpu_context structure is also used elsewhere to hold vcpu context.
> In order to use the percpu to hold additional future host information we
> encapsulate kvm_cpu_context in a new structure and rename the typedef and
> percpu to match.
> 
> Signed-off-by: Andrew Murray 
> ---
>  arch/arm/include/asm/kvm_host.h   |  8 ++--
>  arch/arm64/include/asm/kvm_asm.h  |  4 ++--
>  arch/arm64/include/asm/kvm_host.h | 15 ++-
>  arch/arm64/kernel/asm-offsets.c   |  2 +-
>  virt/kvm/arm/arm.c| 10 ++
>  5 files changed, 25 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 79906ce..71645ba 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -145,7 +145,11 @@ struct kvm_cpu_context {
>   u32 cp15[NR_CP15_REGS];
>  };
>  
> -typedef struct kvm_cpu_context kvm_cpu_context_t;
> +struct kvm_host_data {
> + struct kvm_cpu_context host_ctxt;
> +};
> +
> +typedef struct kvm_host_data kvm_host_data_t;
>  
>  struct kvm_vcpu_arch {
>   struct kvm_cpu_context ctxt;
> @@ -163,7 +167,7 @@ struct kvm_vcpu_arch {
>   struct kvm_vcpu_fault_info fault;
>  
>   /* Host FP context */
> - kvm_cpu_context_t *host_cpu_context;
> + struct kvm_cpu_context *host_cpu_context;
>  
>   /* VGIC state */
>   struct vgic_cpu vgic_cpu;
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 102b5a5..6a9bfd4 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -102,12 +102,12 @@ extern u32 __init_stage2_translation(void);
>  .endm
>  
>  .macro get_host_ctxt reg, tmp
> - hyp_adr_this_cpu \reg, kvm_host_cpu_state, \tmp
> + hyp_adr_this_cpu \reg, kvm_host_data, \tmp
>  .endm
>  
>  .macro get_vcpu_ptr vcpu, ctxt
>   get_host_ctxt \ctxt, \vcpu
> - ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
> + ldr \vcpu, [\ctxt, #HOST_DATA_VCPU]
>   kern_hyp_va \vcpu
>  .endm
>  
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 1550192..1d3ca91 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -205,7 +205,12 @@ struct kvm_cpu_context {
>   struct kvm_vcpu *__hyp_running_vcpu;
>  };
>  
> -typedef struct kvm_cpu_context kvm_cpu_context_t;
> +struct kvm_host_data {
> + struct kvm_cpu_context host_ctxt;
> + struct kvm_pmu_events pmu_events;
> +};
> +
> +typedef struct kvm_host_data kvm_host_data_t;
>  
>  struct kvm_vcpu_arch {
>   struct kvm_cpu_context ctxt;
> @@ -241,7 +246,7 @@ struct kvm_vcpu_arch {
>   struct kvm_guest_debug_arch external_debug_state;
>  
>   /* Pointer to host CPU context */
> - kvm_cpu_context_t *host_cpu_context;
> + struct kvm_cpu_context *host_cpu_context;
>  
>   struct thread_info *host_thread_info;   /* hyp VA */
>   struct user_fpsimd_state *host_fpsimd_state;/* hyp VA */
> @@ -387,7 +392,7 @@ void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  
> -DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
> +DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data);
>  
>  void __kvm_enable_ssbs(void);
>  
> @@ -400,8 +405,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
> pgd_ptr,
>* kernel's mapping to the linear mapping, and store it in tpidr_el2
>* so that we can use adr_l to access per-cpu variables in EL2.
>*/
> - u64 tpidr_el2 = ((u64)this_cpu_ptr(&kvm_host_cpu_state) -
> -  (u64)kvm_ksym_ref(kvm_host_cpu_state));
> + u64 tpidr_el2 = ((u64)this_cpu_ptr(&kvm_host_data) -
> +  (u64)kvm_ksym_ref(kvm_host_data));
>  
>   /*
>* Call initialization code, and switch to the full blown HYP code.
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 323aeb5..cb968ff 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -142,7 +142,7 @@ int main(void)
>DEFINE(CPU_FP_REGS,offsetof(struct kvm_regs, fp_regs));
>DEFINE(VCPU_FPEXC32_EL2,   offsetof(struct kvm_vcpu, 
> arch.ctxt.sys_regs[FPEXC32_EL2]));
>DEFINE(VCPU_HOST_CONTEXT,  offsetof(struct kvm_vcpu, 
> arch.host_cpu_context));
> -  DEFINE(HOST_CONTEXT_VCPU,  offsetof(struct kvm_cpu_context, 
> __hyp_running_vcpu));
> +  DEFINE(HOST_DATA_VCPU, offsetof(struct kvm_host_data, 
> host_ctxt.__hyp_running_vcpu));
>  #endif
>  #ifdef CONFIG_CPU_PM
>DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx));
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 150c8a6..c031ddf

Re: [PATCH v10 0/8] kvm: arm64: Support PUD hugepage at stage 2

2018-12-18 Thread Christoffer Dall
On Tue, Dec 11, 2018 at 05:10:33PM +, Suzuki K Poulose wrote:
> This series is an update to the PUD hugepage support previously posted
> at [0]. This patchset adds support for PUD hugepages at stage 2 a
> feature that is useful on cores that have support for large sized TLB
> mappings (e.g., 1GB for 4K granule).
> 
> The patches are based on v4.20-rc4
> 
> The patches have been tested on AMD Seattle system with the following
> hugepage sizes - 2M and 1G.
> 
> Right now the PUD hugepage for stage2 is only supported if the stage2
> has 4 levels. i.e, with an IPA size of minimum 44bits with 4K pages.
> This could be relaxed to stage2 with 3 levels, with the stage1 PUD huge
> page mapped in the entry level of the stage2 (i.e, pgd). I have not
> added the change here to keep this version stable w.r.t the previous
> version. I could post a patch later after further discussions in the
> list.
> 

For the series:

Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 2/5] arm64: KVM: encapsulate kvm_cpu_context in kvm_host_data

2018-12-11 Thread Christoffer Dall
On Tue, Dec 11, 2018 at 01:11:33PM +, Andrew Murray wrote:
> On Tue, Dec 11, 2018 at 01:29:51PM +0100, Christoffer Dall wrote:
> > On Tue, Dec 11, 2018 at 12:13:37PM +, Andrew Murray wrote:
> > > The virt/arm core allocates a percpu structure as per the 
> > > kvm_cpu_context_t
> > > type, at present this is typedef'd to kvm_cpu_context and used to store
> > > host cpu context. The kvm_cpu_context structure is also used elsewhere to
> > > hold vcpu context. In order to use the percpu to hold additional future
> > > host information we encapsulate kvm_cpu_context in a new structure.
> > > 
> > > Signed-off-by: Andrew Murray 
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h | 8 ++--
> > >  arch/arm64/kernel/asm-offsets.c   | 3 ++-
> > >  virt/kvm/arm/arm.c| 4 +++-
> > >  3 files changed, 11 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > b/arch/arm64/include/asm/kvm_host.h
> > > index 1550192..bcf9d60 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -205,7 +205,11 @@ struct kvm_cpu_context {
> > >   struct kvm_vcpu *__hyp_running_vcpu;
> > >  };
> > >  
> > > -typedef struct kvm_cpu_context kvm_cpu_context_t;
> > > +struct kvm_host_data {
> > > + struct kvm_cpu_context __kvm_cpu_state;
> > > +};
> > > +
> > > +typedef struct kvm_host_data kvm_cpu_context_t;
> > 
> > Now I'm confused based on the conversation on the last version.
> > 
> > I think it's bizarre to use the typedef to rename things in this way.
> > 
> > Can you please make this:
> > 
> >struct kvm_cpu_context;
> >typedef struct kvm_cpu_context kvm_cpu_context_t;
> > 
> >struct kvm_host_data;
> >typedef struct kvm_host_data kvm_host_data_t;
> > 
> > And change the code with the fallout from that.
> 
> I guess I was trying to avoid similar naming issues on arm32. If we
> make the above changes (and thus the DEFINE_PER_CPU in virt/kvm/arm/arm.c)
> then we need to change arm (arch/arm/include/asm/kvm_host.h) such that:
> 
> typedef struct kvm_cpu_context kvm_cpu_context_t;
> 
> becomes:
> 
> typedef struct kvm_cpu_context kvm_host_data_t;
> 
> though I guess this may be acceptable?

I'd prefer it if you just introduce a struct kvm_host_data on the 32-bit
side only containing a struct kvm_cpu_context (if you wanted to support
perf on the 32-bit side you would also add additional fields to it,
similar to arm64).  That avoids the confusing typedef and you get the
symmmetry on both architectures allowing you to use shared code, which
is what we want at the end of the day.

So, on the 32-bit side, change this to:

struct kvm_host_data {
struct kvm_cpu_context *host_ctxt;
};
typedef struct kvm_host_data kvm_host_data_t;


And use the same naming on both 32-bit and 64-bit arm, consistently.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 2/5] arm64: KVM: encapsulate kvm_cpu_context in kvm_host_data

2018-12-11 Thread Christoffer Dall
On Tue, Dec 11, 2018 at 12:13:37PM +, Andrew Murray wrote:
> The virt/arm core allocates a percpu structure as per the kvm_cpu_context_t
> type, at present this is typedef'd to kvm_cpu_context and used to store
> host cpu context. The kvm_cpu_context structure is also used elsewhere to
> hold vcpu context. In order to use the percpu to hold additional future
> host information we encapsulate kvm_cpu_context in a new structure.
> 
> Signed-off-by: Andrew Murray 
> ---
>  arch/arm64/include/asm/kvm_host.h | 8 ++--
>  arch/arm64/kernel/asm-offsets.c   | 3 ++-
>  virt/kvm/arm/arm.c| 4 +++-
>  3 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 1550192..bcf9d60 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -205,7 +205,11 @@ struct kvm_cpu_context {
>   struct kvm_vcpu *__hyp_running_vcpu;
>  };
>  
> -typedef struct kvm_cpu_context kvm_cpu_context_t;
> +struct kvm_host_data {
> + struct kvm_cpu_context __kvm_cpu_state;
> +};
> +
> +typedef struct kvm_host_data kvm_cpu_context_t;

Now I'm confused based on the conversation on the last version.

I think it's bizarre to use the typedef to rename things in this way.

Can you please make this:

   struct kvm_cpu_context;
   typedef struct kvm_cpu_context kvm_cpu_context_t;

   struct kvm_host_data;
   typedef struct kvm_host_data kvm_host_data_t;

And change the code with the fallout from that.


>  
>  struct kvm_vcpu_arch {
>   struct kvm_cpu_context ctxt;
> @@ -241,7 +245,7 @@ struct kvm_vcpu_arch {
>   struct kvm_guest_debug_arch external_debug_state;
>  
>   /* Pointer to host CPU context */
> - kvm_cpu_context_t *host_cpu_context;
> + struct kvm_cpu_context *host_cpu_context;
>  
>   struct thread_info *host_thread_info;   /* hyp VA */
>   struct user_fpsimd_state *host_fpsimd_state;/* hyp VA */
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 323aeb5..da34022 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -142,7 +142,8 @@ int main(void)
>DEFINE(CPU_FP_REGS,offsetof(struct kvm_regs, fp_regs));
>DEFINE(VCPU_FPEXC32_EL2,   offsetof(struct kvm_vcpu, 
> arch.ctxt.sys_regs[FPEXC32_EL2]));
>DEFINE(VCPU_HOST_CONTEXT,  offsetof(struct kvm_vcpu, 
> arch.host_cpu_context));
> -  DEFINE(HOST_CONTEXT_VCPU,  offsetof(struct kvm_cpu_context, 
> __hyp_running_vcpu));
> +  DEFINE(HOST_CONTEXT_VCPU,  offsetof(struct kvm_cpu_context, 
> __hyp_running_vcpu)
> + + offsetof(struct kvm_host_data, 
> __kvm_cpu_state));

This should be HOST_DATA_VCPU, then.


Thanks,

Christoffer

>  #endif
>  #ifdef CONFIG_CPU_PM
>DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx));
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 150c8a6..4f2e534 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -361,8 +361,10 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>   int *last_ran;
> + kvm_cpu_context_t *cpu_ctxt;
>  
>   last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
> + cpu_ctxt = this_cpu_ptr(&kvm_host_cpu_state);
>  
>   /*
>* We might get preempted before the vCPU actually runs, but
> @@ -374,7 +376,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   }
>  
>   vcpu->cpu = cpu;
> - vcpu->arch.host_cpu_context = this_cpu_ptr(&kvm_host_cpu_state);
> + vcpu->arch.host_cpu_context = &cpu_ctxt->__kvm_cpu_state;
>  
>   kvm_arm_set_running_vcpu(vcpu);
>   kvm_vgic_load(vcpu);
> -- 
> 2.7.4
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less

2018-12-11 Thread Christoffer Dall
We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.

However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.

As far as I can tell, this can lead to the same kind of race:

  VM 0, VCPU 0  VM 0, VCPU 1
    
  update_vttbr (vmid 254)
update_vttbr (vmid 1) // roll over
read_lock(kvm_vmid_lock);
force_vm_exit()
  local_irq_disable
  need_new_vmid_gen == false //because vmid gen matches

  enter_guest (vmid 254)
kvm_arch.vttbr = :
read_unlock(kvm_vmid_lock);

enter_guest (vmid 1)

Which results in running two VCPUs in the same VM with different VMIDs
and (even worse) other VCPUs from other VMs could now allocate clashing
VMID 254 from the new generation as long as VCPU 0 is not exiting.

Attempt to solve this by making sure vttbr is updated before another CPU
can observe the updated VMID generation.

Cc: sta...@vger.kernel.org
Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race"
Reviewed-by: Julien Thierry 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/arm.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 23774970c9df..abcd29db2d7a 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -66,7 +66,7 @@ static DEFINE_PER_CPU(struct kvm_vcpu *, 
kvm_arm_running_vcpu);
 static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u32 kvm_next_vmid;
 static unsigned int kvm_vmid_bits __read_mostly;
-static DEFINE_RWLOCK(kvm_vmid_lock);
+static DEFINE_SPINLOCK(kvm_vmid_lock);
 
 static bool vgic_present;
 
@@ -484,7 +484,9 @@ void force_vm_exit(const cpumask_t *mask)
  */
 static bool need_new_vmid_gen(struct kvm *kvm)
 {
-   return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
+   u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
+   smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
+   return unlikely(READ_ONCE(kvm->arch.vmid_gen) != current_vmid_gen);
 }
 
 /**
@@ -499,16 +501,11 @@ static void update_vttbr(struct kvm *kvm)
 {
phys_addr_t pgd_phys;
u64 vmid, cnp = kvm_cpu_has_cnp() ? VTTBR_CNP_BIT : 0;
-   bool new_gen;
 
-   read_lock(&kvm_vmid_lock);
-   new_gen = need_new_vmid_gen(kvm);
-   read_unlock(&kvm_vmid_lock);
-
-   if (!new_gen)
+   if (!need_new_vmid_gen(kvm))
return;
 
-   write_lock(&kvm_vmid_lock);
+   spin_lock(&kvm_vmid_lock);
 
/*
 * We need to re-check the vmid_gen here to ensure that if another vcpu
@@ -516,7 +513,7 @@ static void update_vttbr(struct kvm *kvm)
 * use the same vmid.
 */
if (!need_new_vmid_gen(kvm)) {
-   write_unlock(&kvm_vmid_lock);
+   spin_unlock(&kvm_vmid_lock);
return;
}
 
@@ -539,7 +536,6 @@ static void update_vttbr(struct kvm *kvm)
kvm_call_hyp(__kvm_flush_vm_context);
}
 
-   kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
kvm->arch.vmid = kvm_next_vmid;
kvm_next_vmid++;
kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
@@ -550,7 +546,10 @@ static void update_vttbr(struct kvm *kvm)
vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
VTTBR_VMID_MASK(kvm_vmid_bits);
kvm->arch.vttbr = kvm_phys_to_vttbr(pgd_phys) | vmid | cnp;
 
-   write_unlock(&kvm_vmid_lock);
+   smp_wmb();
+   WRITE_ONCE(kvm->arch.vmid_gen, atomic64_read(&kvm_vmid_gen));
+
+   spin_unlock(&kvm_vmid_lock);
 }
 
 static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring state

2018-12-11 Thread Christoffer Dall
When restoring the active state from userspace, we don't know which CPU
was the source for the active state, and this is not architecturally
exposed in any of the register state.

Set the active_source to 0 in this case.  In the future, we can expand
on this and exposse the information as additional information to
userspace for GICv2 if anyone cares.

Cc: sta...@vger.kernel.org
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic/vgic-mmio.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index f56ff1cf52ec..2b450d49a046 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -338,11 +338,26 @@ static void vgic_mmio_change_active(struct kvm_vcpu 
*vcpu, struct vgic_irq *irq,
vgic_hw_irq_change_active(vcpu, irq, active, !requester_vcpu);
} else {
u32 model = vcpu->kvm->arch.vgic.vgic_model;
+   u8 active_source;
 
irq->active = active;
+
+   /*
+* The GICv2 architecture indicates that the source CPUID for
+* an SGI should be provided during an EOI which implies that
+* the active state is stored somewhere, but at the same time
+* this state is not architecturally exposed anywhere and we
+* have no way of knowing the right source.
+*
+* This may lead to a VCPU not being able to receive
+* additional instances of a particular SGI after migration
+* for a GICv2 VM on some GIC implementations.  Oh well.
+*/
+   active_source = (requester_vcpu) ? requester_vcpu->vcpu_id : 0;
+
if (model == KVM_DEV_TYPE_ARM_VGIC_V2 &&
active && vgic_irq_is_sgi(irq->intid))
-   irq->active_source = requester_vcpu->vcpu_id;
+   irq->active_source = active_source;
}
 
if (irq->active)
-- 
2.18.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: KVM arm realtime performance optimization

2018-12-11 Thread Christoffer Dall
On Tue, Dec 11, 2018 at 10:00:35AM +0100, Steven Miao (Arm Technology China) 
wrote:
> Hi Christopher,
> 
> > -Original Message-
> > From: Christoffer Dall 
> > Sent: Monday, December 10, 2018 9:19 PM
> > To: Steven Miao (Arm Technology China) 
> > Cc: kvmarm@lists.cs.columbia.edu
> > Subject: Re: KVM arm realtime performance optimization
> > 
> > On Mon, Dec 10, 2018 at 05:36:09AM +, Steven Miao (Arm Technology
> > China) wrote:
> > >
> > > From: kvmarm-boun...@lists.cs.columbia.edu
> > >  On Behalf Of Steven Miao (Arm
> > > Technology China)
> > > Sent: Thursday, December 6, 2018 3:05 PM
> > > To: kvmarm@lists.cs.columbia.edu
> > > Subject: KVM arm realtime performance optimization
> > >
> > > Hi Everyone,
> > >
> > > I' currently testing KVM arm realtime performance on a hikey960 board.
> > My test benchmark is cyclictest to measure thread wake up latency both on
> > Host linux OS and KVM Guest linux OS.
> > >
> > > Host OS:
> > >
> > > hikey960:/mnt/debian/usr/src/linux#  cyclictest -p 99 -t 4 -m -n -a
> > > 0-3 -l 10 # /dev/cpu_dma_latency set to 0us
> > > WARN: Running on unknown kernel version...YMMV
> > > policy: fifo: loadavg: 0.00 0.00 0.00 1/165 3270
> > >
> > > T: 0 ( 3266) P:99 I:1000 C: 10 Min:  4 Act:   15 Avg:   15 Max:   
> > >   139
> > > T: 1 ( 3267) P:99 I:1500 C:  66736 Min:  4 Act:   15 Avg:   15 Max:   
> > >   239
> > > T: 2 ( 3268) P:99 I:2000 C:  50051 Min:  4 Act:   19 Avg:   15 Max:   
> > >43
> > > T: 3 ( 3269) P:99 I:2500 C:  40039 Min:  5 Act:   15 Avg:   16 Max:   
> > >74
> > >
> > > Guest OS:
> > > root@genericarmv8:~# cyclictest -p 99 -t 4 -m -n -a 0-3 -l 10 #
> > > /dev/cpu_dma_latency set to 0us
> > > WARN: Running on unknown kernel version...YMMV
> > > policy: fifo: loadavg: 0.13 0.05 0.01 1/70 293
> > >
> > > T: 0 (  290) P:99 I:1000 C: 10 Min:  7 Act:   44 Avg:   85 Max:   
> > > 16111
> > > T: 1 (  291) P:99 I:1500 C:  5 Min:  7 Act:   81 Avg:   90 Max:   
> > > 15306
> > > T: 2 (  292) P:99 I:2000 C:  49995 Min:  7 Act:   88 Avg:   87 Max:   
> > > 16703
> > > T: 3 (  293) P:99 I:2500 C:  39992 Min:  8 Act:   72 Avg:   97 Max:   
> > > 14976
> > >
> > >
> > > RT performance on KVM guest OS is poor compared to that on host OS. The
> > average wake up latency is about 6 - 7 times on Guest OS vs on Host OS.
> > > I've tried some configurations to improve RT in KVM, like:
> > > 1 Can be combined with CPU isolation
> > > 2 Host OS and Guest OS use RT preempt kernel
> > > 3 Host CPU avoid frequency change
> > > 4 Configure NO_HZ_FULL for Guest OS
> > >
> > > There could be a little improvement after apply above configuration, but
> > the RT performance is still very poor.
> > >
> > > 5 Guest OS use idle poll instead of WFI to avoid trap and switch out
> > >
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index 2dc0f84..53aef78 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > @@ -83,7 +83,7 @@ void arch_cpu_idle(void)
> > >  * tricks
> > >  */
> > > trace_cpu_idle_rcuidle(1, smp_processor_id());
> > > -   cpu_do_idle();
> > > +   cpu_relax();
> > > local_irq_enable();
> > > trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());  }
> > >
> > > root@genericarmv8:~# cyclictest -p 99 -t 4 -m -n  -l 10 #
> > > /dev/cpu_dma_latency set to 0us
> > > WARN: Running on unknown kernel version...YMMV
> > > policy: fifo: loadavg: 0.07 0.03 0.00 1/99 328
> > >
> > > T: 0 (  325) P:99 I:1000 C: 10 Min:  3 Act:6 Avg:   13 Max:   
> > >  4999
> > > T: 1 (  326) P:99 I:1500 C:  66659 Min:  5 Act:7 Avg:   14 Max:   
> > >  3449
> > > T: 2 (  327) P:99 I:2000 C:  49989 Min:  4 Act:7 Avg:9 Max:   
> > > 11471
> > > T: 3 (  328) P:99 I:2500 C:  39986 Min:  4 Act:   14 Avg:   14 Max:   
> > > 11253
> > >
> > > The method 5 can improve Guest OS RT performance a lot, the average
> > thread wake up latency on Guest OS is almost same as its on Host OS, but the
> > Max wake up latency is still very poor.
> > >
&

Re: [PATCH v2 4/4] KVM: arm/arm64: vgic: Make vgic_cpu->ap_list_lock a raw_spinlock

2018-12-11 Thread Christoffer Dall
On Mon, Nov 26, 2018 at 06:26:47PM +, Julien Thierry wrote:
> vgic_cpu->ap_list_lock must always be taken with interrupts disabled as
> it is used in interrupt context.
> 
> For configurations such as PREEMPT_RT_FULL, this means that it should
> be a raw_spinlock since RT spinlocks are interruptible.
> 
> Signed-off-by: Julien Thierry 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 


Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/4] KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabled

2018-12-11 Thread Christoffer Dall
On Mon, Nov 26, 2018 at 06:26:44PM +, Julien Thierry wrote:
> To change the active state of an MMIO, halt is requested for all vcpus of
> the affected guest before modifying the IRQ state. This is done by calling
> cond_resched_lock() in vgic_mmio_change_active(). However interrupts are
> disabled at this point and we cannot reschedule a vcpu.
> 
> Solve this by waiting for all vcpus to be halted after emmiting the halt
> request.
> 
> Signed-off-by: Julien Thierry 
> Suggested-by: Marc Zyngier 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: sta...@vger.kernel.org
> ---
>  virt/kvm/arm/vgic/vgic-mmio.c | 36 ++--
>  1 file changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index f56ff1c..5c76a92 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -313,27 +313,6 @@ static void vgic_mmio_change_active(struct kvm_vcpu 
> *vcpu, struct vgic_irq *irq,
>  
>   spin_lock_irqsave(&irq->irq_lock, flags);
>  
> - /*
> -  * If this virtual IRQ was written into a list register, we
> -  * have to make sure the CPU that runs the VCPU thread has
> -  * synced back the LR state to the struct vgic_irq.
> -  *
> -  * As long as the conditions below are true, we know the VCPU thread
> -  * may be on its way back from the guest (we kicked the VCPU thread in
> -  * vgic_change_active_prepare)  and still has to sync back this IRQ,
> -  * so we release and re-acquire the spin_lock to let the other thread
> -  * sync back the IRQ.
> -  *
> -  * When accessing VGIC state from user space, requester_vcpu is
> -  * NULL, which is fine, because we guarantee that no VCPUs are running
> -  * when accessing VGIC state from user space so irq->vcpu->cpu is
> -  * always -1.
> -  */
> - while (irq->vcpu && /* IRQ may have state in an LR somewhere */
> -irq->vcpu != requester_vcpu && /* Current thread is not the VCPU 
> thread */
> -irq->vcpu->cpu != -1) /* VCPU thread is running */
> - cond_resched_lock(&irq->irq_lock);
> -
>   if (irq->hw) {
>   vgic_hw_irq_change_active(vcpu, irq, active, !requester_vcpu);
>   } else {
> @@ -368,8 +347,21 @@ static void vgic_mmio_change_active(struct kvm_vcpu 
> *vcpu, struct vgic_irq *irq,
>   */
>  static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
>  {
> - if (intid > VGIC_NR_PRIVATE_IRQS)
> + if (intid > VGIC_NR_PRIVATE_IRQS) {
> + struct kvm_vcpu *tmp;
> + int i;
> +
>   kvm_arm_halt_guest(vcpu->kvm);
> +
> + /* Wait for each vcpu to be halted */
> + kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> + if (tmp == vcpu)
> + continue;
> +
> + while (tmp->cpu != -1)
> + cond_resched();
> + }

I'm actually thinking we don't need this loop at all after the requet
rework which causes:

 1. kvm_arm_halt_guest() to use kvm_make_all_cpus_request(kvm, KVM_REQ_SLEEP), 
and
 2. KVM_REQ_SLEEP uses REQ_WAIT, and
 3. REQ_WAIT requires the VCPU to respond to IPIs before returning, and
 4. a VCPU thread can only respond when it enables interrupt, and
 5. enabling interrupts when running a VCPU only happens after syncing
the VGIC hwstate.

Does that make sense?

It would be good if someone can validate this, but if it holds this
patch just becomes a nice deletion of the logic in
vgic-mmio_change_active.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] kvm/arm: skip MMIO insn after emulation

2018-12-11 Thread Christoffer Dall
On Fri, Nov 09, 2018 at 03:07:10PM +, Mark Rutland wrote:
> When we emulate an MMIO instruction, we advance the CPU state within
> decode_hsr(), before emulating the instruction effects.
> 
> Having this logic in decode_hsr() is opaque, and advancing the state
> before emulation is problematic. It gets in the way of applying
> consistent single-step logic, and it prevents us from being able to fail
> an MMIO instruction with a synchronous exception.
> 
> Clean this up by only advancing the CPU state *after* the effects of the
> instruction are emulated.
> 
> Signed-off-by: Mark Rutland 
> Cc: Alex Bennée 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: Peter Maydell 
> ---
>  virt/kvm/arm/mmio.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> index dac7ceb1a677..08443a15e6be 100644
> --- a/virt/kvm/arm/mmio.c
> +++ b/virt/kvm/arm/mmio.c
> @@ -117,6 +117,12 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   vcpu_set_reg(vcpu, vcpu->arch.mmio_decode.rt, data);
>   }
>  
> + /*
> +  * The MMIO instruction is emulated and should not be re-executed
> +  * in the guest.
> +  */
> + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> +
>   return 0;
>  }
>  
> @@ -144,11 +150,6 @@ static int decode_hsr(struct kvm_vcpu *vcpu, bool 
> *is_write, int *len)
>   vcpu->arch.mmio_decode.sign_extend = sign_extend;
>   vcpu->arch.mmio_decode.rt = rt;
>  
> - /*
> -  * The MMIO instruction is emulated and should not be re-executed
> -  * in the guest.
> -  */
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
>   return 0;
>  }
>  
> -- 
> 2.11.0
> 
Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] kvm/arm: consistently advance singlestep when emulating instructions

2018-12-11 Thread Christoffer Dall
On Fri, Nov 09, 2018 at 03:07:11PM +, Mark Rutland wrote:
> When we emulate a guest instruction, we don't advance the hardware
> singlestep state machine, and thus the guest will receive a software
> step exception after a next instruction which is not emulated by the
> host.
> 
> We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
> whether userspace requested a single step, and fake a debug exception
> from within the kernel. Other times, we advance the HW singlestep state
> rely on the HW to generate the exception for us. Thus, the observed step
> behaviour differs for host and guest.
> 
> Let's make this simpler and consistent by always advancing the HW
> singlestep state machine when we skip an instruction. Thus we can rely
> on the hardware to generate the singlestep exception for us, and never
> need to explicitly check for an active-pending step, nor do we need to
> fake a debug exception from the guest.
> 
> Signed-off-by: Mark Rutland 
> Cc: Alex Bennée 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: Peter Maydell 
> ---
>  arch/arm/include/asm/kvm_host.h  |  5 
>  arch/arm64/include/asm/kvm_emulate.h | 35 --
>  arch/arm64/include/asm/kvm_host.h|  1 -
>  arch/arm64/kvm/debug.c   | 21 
>  arch/arm64/kvm/handle_exit.c | 14 +--
>  arch/arm64/kvm/hyp/switch.c  | 43 
> +++-
>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c | 12 ++---
>  virt/kvm/arm/arm.c   |  2 --
>  virt/kvm/arm/hyp/vgic-v3-sr.c|  6 -
>  9 files changed, 46 insertions(+), 93 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 5ca5d9af0c26..c5634c6ffcea 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -296,11 +296,6 @@ static inline void kvm_arm_init_debug(void) {}
>  static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
> -static inline bool kvm_arm_handle_step_debug(struct kvm_vcpu *vcpu,
> -  struct kvm_run *run)
> -{
> - return false;
> -}
>  
>  int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
>  struct kvm_device_attr *attr);
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 21247870def7..506386a3edde 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -24,6 +24,7 @@
>  
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -147,14 +148,6 @@ static inline bool kvm_condition_valid(const struct 
> kvm_vcpu *vcpu)
>   return true;
>  }
>  
> -static inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
> -{
> - if (vcpu_mode_is_32bit(vcpu))
> - kvm_skip_instr32(vcpu, is_wide_instr);
> - else
> - *vcpu_pc(vcpu) += 4;
> -}
> -
>  static inline void vcpu_set_thumb(struct kvm_vcpu *vcpu)
>  {
>   *vcpu_cpsr(vcpu) |= PSR_AA32_T_BIT;
> @@ -424,4 +417,30 @@ static inline unsigned long 
> vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>   return data;/* Leave LE untouched */
>  }
>  
> +static inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
> +{
> + if (vcpu_mode_is_32bit(vcpu))
> + kvm_skip_instr32(vcpu, is_wide_instr);
> + else
> + *vcpu_pc(vcpu) += 4;
> +
> + /* advance the singlestep state machine */
> + *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
> +}
> +
> +/*
> + * Skip an instruction which has been emulated at hyp while most guest 
> sysregs
> + * are live.
> + */
> +static inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
> +{
> + *vcpu_pc(vcpu) = read_sysreg_el2(elr);
> + vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(spsr);
> +
> + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> +
> + write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, spsr);
> + write_sysreg_el2(*vcpu_pc(vcpu), elr);
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 52fbc823ff8c..7a5035f9c5c3 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -445,7 +445,6 @@ void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
&g

Re: [PATCH v6 2/4] arm64: KVM: add accessors to track guest/host only counters

2018-12-10 Thread Christoffer Dall
On Mon, Dec 10, 2018 at 11:46:56PM +, Andrew Murray wrote:
> On Mon, Dec 10, 2018 at 11:26:34AM +0100, Christoffer Dall wrote:
> > On Mon, Dec 10, 2018 at 09:45:57AM +, Andrew Murray wrote:
> > > In order to effeciently enable/disable guest/host only perf counters
> > > at guest entry/exit we add bitfields to kvm_cpu_context for guest and
> > > host events as well as accessors for updating them.
> > > 
> > > Signed-off-by: Andrew Murray 
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h | 24 
> > >  1 file changed, 24 insertions(+)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > b/arch/arm64/include/asm/kvm_host.h
> > > index 1550192..800c87b 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -203,6 +203,8 @@ struct kvm_cpu_context {
> > >   };
> > >  
> > >   struct kvm_vcpu *__hyp_running_vcpu;
> > > + u32 events_host;
> > > + u32 events_guest;
> > 
> > This is confusing to me.
> > 
> > These values appear only to be used for the host instance, which makes
> > me wonder why we add them to kvm_cpu_context, which is also used for the
> > guest state?  Should we not instead consider moving them to their own
> 
> Indeed they are only used for the host instance (i.e. not arch.ctxt). I
> hadn't realised the structure was used in other contexts. So it isn't
> optimal to have additional fields that aren't always used.
> 
> 
> > data structure and add a per-cpu data structure or something more fancy
> > like having a new data structure, kvm_percpu_host_data, which contains
> > the kvm_cpu_context and the events flags?
> 
> Using a percpu for the guest/host events was an approach I took prior to
> sharing on the list - an additional hypervisor mapping is needed (such
> that the percpu can be accessed from the hyp switch code) and I assume
> there to be a very marginal additional amount of work resulting from it
> switching between host/guest.
> 
> I also considered using the unused ctxt.sys_regs[PMCNTENSET_EL0] register
> though this feels like a hack and would probably involve a system register
> read in the switch code to read the current PMU state.

Yeah, that doesn't sound overly appealing.

> 
> I attempted the fancy approach (see below) - the struct and variable
> naming isn't ideal (virt/arm/arm.c defines kvm_cpu_context_t and of course
> is shared with arm32). Also I updated asm-offsets.c to reflect the now
> nested struct (for use with get_vcpu_ptr) - it's not strictly necessary
> but adds robustness.
> 
> What are your thoughts on the best approach?

The naming and typedef is the biggest issue with the patch below, but
that can be easily fixed.

The fact that you keep the context pointer on the vcpu structure makes
this much less painful that it could have been, so I think this is
acceptable.

I can also live with mapping the additional data structure if necessary.

> 
> > 
> > I don't know much about perf, but doesn't this design also imply that
> > you can only set these modifiers at a per-cpu level, and not attach
> > the modifiers to a task/vcpu or vm ?  Is that by design?
> 
> You can set these modifiers on a task and the perf core will take care of
> disabling the perf_event when the task is scheduled in/out.

I now remember how this works, thanks for the nudge in the right
direction.

> 
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 800c87b..1f4a78a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -203,11 +203,19 @@ struct kvm_cpu_context {
> };
> 
> struct kvm_vcpu *__hyp_running_vcpu;
> +};
> +
> +struct kvm_pmu_events {
> u32 events_host;
> u32 events_guest;
>  };
> 
> -typedef struct kvm_cpu_context kvm_cpu_context_t;
> +struct kvm_host_data {
> +   struct kvm_cpu_context __kvm_cpu_state;
> +   struct kvm_pmu_events pmu_events;
> +};
> +
> +typedef struct kvm_host_data kvm_cpu_context_t;
> 

This look really strange.  I think we can just keep the old typedef, and
if you like you can introduce a kvm_host_data_t...

>  struct kvm_vcpu_arch {
> struct kvm_cpu_context ctxt;
> @@ -243,7 +251,7 @@ struct kvm_vcpu_arch {
> struct kvm_guest_debug_arch external_debug_state;
> 
> /* Pointer to host CPU context */
> -   kvm_cpu_context_t *host_cpu_context;
> +   struct kvm_cpu_context *host_cpu_context;
> 
> struct th

Re: KVM arm realtime performance optimization

2018-12-10 Thread Christoffer Dall
On Mon, Dec 10, 2018 at 05:36:09AM +, Steven Miao (Arm Technology China) 
wrote:
> 
> From: kvmarm-boun...@lists.cs.columbia.edu 
>  On Behalf Of Steven Miao (Arm 
> Technology China)
> Sent: Thursday, December 6, 2018 3:05 PM
> To: kvmarm@lists.cs.columbia.edu
> Subject: KVM arm realtime performance optimization
> 
> Hi Everyone,
> 
> I' currently testing KVM arm realtime performance on a hikey960 board. My 
> test benchmark is cyclictest to measure thread wake up latency both on Host 
> linux OS and KVM Guest linux OS.
> 
> Host OS:
> 
> hikey960:/mnt/debian/usr/src/linux#  cyclictest -p 99 -t 4 -m -n -a 0-3 -l 
> 10
> # /dev/cpu_dma_latency set to 0us
> WARN: Running on unknown kernel version...YMMV
> policy: fifo: loadavg: 0.00 0.00 0.00 1/165 3270
> 
> T: 0 ( 3266) P:99 I:1000 C: 10 Min:  4 Act:   15 Avg:   15 Max: 
> 139
> T: 1 ( 3267) P:99 I:1500 C:  66736 Min:  4 Act:   15 Avg:   15 Max: 
> 239
> T: 2 ( 3268) P:99 I:2000 C:  50051 Min:  4 Act:   19 Avg:   15 Max:  
> 43
> T: 3 ( 3269) P:99 I:2500 C:  40039 Min:  5 Act:   15 Avg:   16 Max:  
> 74
> 
> Guest OS:
> root@genericarmv8:~# cyclictest -p 99 -t 4 -m -n -a 0-3 -l 10
> # /dev/cpu_dma_latency set to 0us
> WARN: Running on unknown kernel version...YMMV
> policy: fifo: loadavg: 0.13 0.05 0.01 1/70 293
> 
> T: 0 (  290) P:99 I:1000 C: 10 Min:  7 Act:   44 Avg:   85 Max:   
> 16111
> T: 1 (  291) P:99 I:1500 C:  5 Min:  7 Act:   81 Avg:   90 Max:   
> 15306
> T: 2 (  292) P:99 I:2000 C:  49995 Min:  7 Act:   88 Avg:   87 Max:   
> 16703
> T: 3 (  293) P:99 I:2500 C:  39992 Min:  8 Act:   72 Avg:   97 Max:   
> 14976
> 
> 
> RT performance on KVM guest OS is poor compared to that on host OS. The 
> average wake up latency is about 6 - 7 times on Guest OS vs on Host OS.
> I've tried some configurations to improve RT in KVM, like:
> 1 Can be combined with CPU isolation
> 2 Host OS and Guest OS use RT preempt kernel
> 3 Host CPU avoid frequency change
> 4 Configure NO_HZ_FULL for Guest OS
> 
> There could be a little improvement after apply above configuration, but the 
> RT performance is still very poor.
> 
> 5 Guest OS use idle poll instead of WFI to avoid trap and switch out
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 2dc0f84..53aef78 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -83,7 +83,7 @@ void arch_cpu_idle(void)
>  * tricks
>  */
> trace_cpu_idle_rcuidle(1, smp_processor_id());
> -   cpu_do_idle();
> +   cpu_relax();
> local_irq_enable();
> trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>  }
> 
> root@genericarmv8:~# cyclictest -p 99 -t 4 -m -n  -l 10
> # /dev/cpu_dma_latency set to 0us
> WARN: Running on unknown kernel version...YMMV
> policy: fifo: loadavg: 0.07 0.03 0.00 1/99 328
> 
> T: 0 (  325) P:99 I:1000 C: 10 Min:  3 Act:6 Avg:   13 Max:
> 4999
> T: 1 (  326) P:99 I:1500 C:  66659 Min:  5 Act:7 Avg:   14 Max:
> 3449
> T: 2 (  327) P:99 I:2000 C:  49989 Min:  4 Act:7 Avg:9 Max:   
> 11471
> T: 3 (  328) P:99 I:2500 C:  39986 Min:  4 Act:   14 Avg:   14 Max:   
> 11253
> 
> The method 5 can improve Guest OS RT performance a lot, the average thread 
> wake up latency on Guest OS is almost same as its on Host OS, but the Max 
> wake up latency is still very poor.
> 
> Anyone has any idea to improve RT performance on KVM Guest OS? Although 
> method 5 can improve RT performance on Guest OS a lot, I think it is not good 
> idea.
> 
This is a known problem and there have been presentations about similar
problems on x86 in past KVM Forums.

The first thing to do is analyze the critical path that adds latency to
a wakeup.  One way to do that is to instrument the path by adding time
counter reads to the path and figuring out what takes time.

One thing you can look at is having a configurable grace period in KVM's
block function before the process actually goes to sleep (and calls
kvm_vcpu_put) and the host scheduler, and see if that helps anything.

At the end of the day, virtualization is going to add a lot of latency
when you have to switch the entire state of your CPU, and in terms of
virtual RT, you end up with a very high minimal latency.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm/arm: return 0 when the number of objects is not lessthan min

2018-12-10 Thread Christoffer Dall
On Thu, Dec 06, 2018 at 09:56:30AM +0800, peng.h...@zte.com.cn wrote:
> >On Wed, Dec 05, 2018 at 09:15:51AM +0800, Peng Hao wrote:
> >> Return 0 when there is enough kvm_mmu_memory_cache object.
> >>
> >> Signed-off-by: Peng Hao 
> >> ---
> >>  virt/kvm/arm/mmu.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >> index ed162a6..fcda0ce 100644
> >> --- a/virt/kvm/arm/mmu.c
> >> +++ b/virt/kvm/arm/mmu.c
> >> @@ -127,7 +127,7 @@ static int mmu_topup_memory_cache(struct 
> >> kvm_mmu_memory_cache *cache,
> >>  while (cache->nobjs < max) {
> >>  page = (void *)__get_free_page(PGALLOC_GFP);
> >>  if (!page)
> >> -return -ENOMEM;
> >> +return cache->nobjs >= min ? 0 : -ENOMEM;
> >
> >This condition will never be true here, as the exact same condition is
> >already checked above, and if it had been true, then we wouldn't be here.
> >
> >What problem are you attempting to solve?
> >
> if (cache->nobjs >= min)  --here cache->nobjs can continue downward 
>  return 0;
> while (cache->nobjs < max) {
> page = (void *)__get_free_page(PGALLOC_GFP);
> if (!page)
>return -ENOMEM; -here it is possible that  
> (cache->nobjs >= min) and (cache->nobjs cache->objects[cache->nobjs++] = page; ---here cache->nobjs increasing
>   }
> 
> I just think the logic of this function is to return 0 as long as 
> (cache->nobjs >= min).
> thanks.

That's not the intention, nor is it on any of the other architectures
implementing the same thing (this one goes on the list of stuff we
should be sharing between architectures).

The idea is that you fill up the cache when it goes below min, and you
are always able to fill up to max.

If you're not able to fill up to max, then your system is seriously low
on memory and continuing to run this VM is not likely to be a good idea,
so you might as well tell user space to do something now instead of
waiting until the situation is even worse.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/8] arm64: KVM: Handle ARM erratum 1165522 in TLB invalidation

2018-12-10 Thread Christoffer Dall
On Mon, Dec 10, 2018 at 11:15:00AM +, James Morse wrote:
> Hi Marc, Christoffer,
> 
> On 10/12/2018 10:46, Marc Zyngier wrote:
> > On 10/12/2018 10:19, Christoffer Dall wrote:
> >> On Thu, Dec 06, 2018 at 05:31:25PM +, Marc Zyngier wrote:
> >>> In order to avoid TLB corruption whilst invalidating TLBs on CPUs
> >>> affected by erratum 1165522, we need to prevent S1 page tables
> >>> from being usable.
> >>>
> >>> For this, we set the EL1 S1 MMU on, and also disable the page table
> >>> walker (by setting the TCR_EL1.EPD* bits to 1).
> >>>
> >>> This ensures that once we switch to the EL1/EL0 translation regime,
> >>> speculated AT instructions won't be able to parse the page tables.
> 
> >>> @@ -64,11 +93,18 @@ static void __hyp_text 
> >>> __tlb_switch_to_host_vhe(struct kvm *kvm,
> >>>   write_sysreg(0, vttbr_el2);
> >>>   write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> >>>   isb();
> >>> - local_irq_restore(flags);
> >>> +
> >>> + if (cpus_have_const_cap(ARM64_WORKAROUND_1165522)) {
> >>> + /* Restore the guest's registers to what they were */
> >>
> >> host's ?
> > 
> > Hum... Yes, silly thinko.
> 
> I thought these were the guests registers because they are EL1 registers and
> this is a VHE-only path.
> 'interrupted guest' was how I read this. This stuff can get called if memory 
> is
> allocated for guest-A while a vcpu is loaded, and reclaims memory from guest-B
> causing an mmu-notifier call for stage2. This is why we have to put guest-A's
> registers back as we weren't pre-empted, and we expect EL1 to be untouched.
> 
> I agree they could belong to no-guest if a vcpu isn't loaded at all... is host
> the term used here?
> 

Ah, you're right.  Host is not the right term either.

I haven't done the call path analysis, so not sure about all the
possible contexts where all this can be called, but if it's really truly
only in guest context, then we don't need to save the values to a
temporary struct at all, but can save them on the vcpu.

We can also just side-step the whole thing and just say "Restore the
registers to what they were".


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()

2018-12-10 Thread Christoffer Dall
On Mon, Dec 10, 2018 at 10:47:42AM +, Suzuki K Poulose wrote:
> 
> 
> On 10/12/2018 08:56, Christoffer Dall wrote:
> >On Mon, Dec 03, 2018 at 01:37:37PM +, Suzuki K Poulose wrote:
> >>Hi Anshuman,
> >>
> >>On 03/12/2018 12:11, Anshuman Khandual wrote:
> >>>
> >>>
> >>>On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> >>>>The code for operations such as marking the pfn as dirty, and
> >>>>dcache/icache maintenance during stage 2 fault handling is duplicated
> >>>>between normal pages and PMD hugepages.
> >>>>
> >>>>Instead of creating another copy of the operations when we introduce
> >>>>PUD hugepages, let's share them across the different pagesizes.
> >>>>
> >>>>Signed-off-by: Punit Agrawal 
> >>>>Reviewed-by: Suzuki K Poulose 
> >>>>Cc: Christoffer Dall 
> >>>>Cc: Marc Zyngier 
> >>>>---
> >>>>  virt/kvm/arm/mmu.c | 49 --
> >>>>  1 file changed, 30 insertions(+), 19 deletions(-)
> >>>>
> >>>>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >>>>index 5eca48bdb1a6..59595207c5e1 100644
> >>>>--- a/virt/kvm/arm/mmu.c
> >>>>+++ b/virt/kvm/arm/mmu.c
> >>>>@@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>>>phys_addr_t fault_ipa,
> >>>>unsigned long fault_status)
> >>>>  {
> >>>>  int ret;
> >>>>- bool write_fault, exec_fault, writable, hugetlb = false, force_pte = 
> >>>>false;
> >>>>+ bool write_fault, exec_fault, writable, force_pte = false;
> >>>>  unsigned long mmu_seq;
> >>>>  gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> >>>>  struct kvm *kvm = vcpu->kvm;
> >>>>@@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>>>phys_addr_t fault_ipa,
> >>>>  kvm_pfn_t pfn;
> >>>>  pgprot_t mem_type = PAGE_S2;
> >>>>  bool logging_active = memslot_is_logging(memslot);
> >>>>- unsigned long flags = 0;
> >>>>+ unsigned long vma_pagesize, flags = 0;
> >>>
> >>>A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.
> >>
> >>May be we could call it mapsize. pagesize is confusing.
> >>
> >
> >I'm ok with mapsize.  I see the vma_pagesize name coming from the fact
> >that this is initially set to the return value from vma_kernel_pagesize.
> >
> >I have not problems with either vma_pagesize or mapsize.
> >
> >>>
> >>>>  write_fault = kvm_is_write_fault(vcpu);
> >>>>  exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> >>>>@@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>>>phys_addr_t fault_ipa,
> >>>>  return -EFAULT;
> >>>>  }
> >>>>- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
> >>>>- hugetlb = true;
> >>>>+ vma_pagesize = vma_kernel_pagesize(vma);
> >>>>+ if (vma_pagesize == PMD_SIZE && !logging_active) {
> >>>>  gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> >>>>  } else {
> >>>>+ /*
> >>>>+  * Fallback to PTE if it's not one of the Stage 2
> >>>>+  * supported hugepage sizes
> >>>>+  */
> >>>>+ vma_pagesize = PAGE_SIZE;
> >>>
> >>>This seems redundant and should be dropped. vma_kernel_pagesize() here 
> >>>either
> >>>calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply 
> >>>returns
> >>>PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA 
> >>>is
> >>>backed either by HugeTLB pages or simply normal pages. vma_pagesize would
> >>>either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence 
> >>>if
> >>>its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again.
> >>
> >>We may want to force using the PTE mappings when logging_active (e.g, 
> >>migration
> >>?) to prevent keep tracking of huge pages. So the check is still valid

Re: [PATCH v3 1/8] arm64: KVM: Make VHE Stage-2 TLB invalidation operations non-interruptible

2018-12-10 Thread Christoffer Dall
On Mon, Dec 10, 2018 at 10:24:31AM +, Marc Zyngier wrote:
> Hi Christoffer,
> 
> On 10/12/2018 10:03, Christoffer Dall wrote:
> > On Thu, Dec 06, 2018 at 05:31:19PM +, Marc Zyngier wrote:
> >> Contrary to the non-VHE version of the TLB invalidation helpers, the VHE
> >> code  has interrupts enabled, meaning that we can take an interrupt in
> >> the middle of such a sequence, and start running something else with
> >> HCR_EL2.TGE cleared.
> > 
> > Do we have to clear TGE to perform the TLB invalidation, or is that just
> > a side-effect of re-using code?
> 
> We really do need to clear TGE. From the description of TLBI VMALLE1IS:
> 
> 
> When EL2 is implemented and enabled in the current Security state:
> — If HCR_EL2.{E2H, TGE} is not {1, 1}, the entry would be used with the
> current VMID and would be required to translate the specified VA using
> the EL1&0 translation regime.
> — If HCR_EL2.{E2H, TGE} is {1, 1}, the entry would be required to
> translate the specified VA using the EL2&0 translation regime.
> 
> 
> > Also, do we generally perform TLB invalidations in the kernel with
> > interrupts disabled, or is this just a result of clearing TGE?
> 
> That's definitely a result of clearing TGE. We could be taking an
> interrupt here, and execute a user access on the back of it (perf will
> happily walk a user-space stack in that context, for example). Having
> TGE clear in that context. An alternative solution would be to
> save/restore TGE on interrupt entry/exit, but that's a bit overkill when
> you consider how rarely we issue such TLB invalidation.
> 
> > Somehow I feel like this should look more like just another TLB
> > invalidation in the kernel, but if there's a good reason why it can't
> > then this is fine.
> 
> The rest of the TLB invalidation in the kernel doesn't need to
> save/restore any context. They apply to a set of parameters that are
> already loaded on the CPU. What we have here is substantially different.
> 

Thanks for the explanation and Arm ARM quote.  I failed to find that on
my own this particular Monday morning.

Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 2/4] arm64: KVM: add accessors to track guest/host only counters

2018-12-10 Thread Christoffer Dall
On Mon, Dec 10, 2018 at 09:45:57AM +, Andrew Murray wrote:
> In order to effeciently enable/disable guest/host only perf counters
> at guest entry/exit we add bitfields to kvm_cpu_context for guest and
> host events as well as accessors for updating them.
> 
> Signed-off-by: Andrew Murray 
> ---
>  arch/arm64/include/asm/kvm_host.h | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 1550192..800c87b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -203,6 +203,8 @@ struct kvm_cpu_context {
>   };
>  
>   struct kvm_vcpu *__hyp_running_vcpu;
> + u32 events_host;
> + u32 events_guest;

This is confusing to me.

These values appear only to be used for the host instance, which makes
me wonder why we add them to kvm_cpu_context, which is also used for the
guest state?  Should we not instead consider moving them to their own
data structure and add a per-cpu data structure or something more fancy
like having a new data structure, kvm_percpu_host_data, which contains
the kvm_cpu_context and the events flags?

I don't know much about perf, but doesn't this design also imply that
you can only set these modifiers at a per-cpu level, and not attach
the modifiers to a task/vcpu or vm ?  Is that by design?


Thanks,

Christoffer

>  };
>  
>  typedef struct kvm_cpu_context kvm_cpu_context_t;
> @@ -467,11 +469,33 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
>  
> +#define KVM_PMU_EVENTS_HOST  1
> +#define KVM_PMU_EVENTS_GUEST 2
> +
>  #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */
>  static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>  {
>   return kvm_arch_vcpu_run_map_fp(vcpu);
>  }
> +static inline void kvm_set_pmu_events(u32 set, int flags)
> +{
> + kvm_cpu_context_t *ctx = this_cpu_ptr(&kvm_host_cpu_state);
> +
> + if (flags & KVM_PMU_EVENTS_HOST)
> + ctx->events_host |= set;
> + if (flags & KVM_PMU_EVENTS_GUEST)
> + ctx->events_guest |= set;
> +}
> +static inline void kvm_clr_pmu_events(u32 clr)
> +{
> + kvm_cpu_context_t *ctx = this_cpu_ptr(&kvm_host_cpu_state);
> +
> + ctx->events_host &= ~clr;
> + ctx->events_guest &= ~clr;
> +}
> +#else
> +static inline void kvm_set_pmu_events(u32 set, int flags) {}
> +static inline void kvm_clr_pmu_events(u32 clr) {}
>  #endif
>  
>  static inline void kvm_arm_vhe_guest_enter(void)
> -- 
> 2.7.4
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/8] arm64: KVM: Handle ARM erratum 1165522 in TLB invalidation

2018-12-10 Thread Christoffer Dall
0,13 +116,13 @@ static hyp_alternate_select(__tlb_switch_to_host,
>  
>  void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  {
> - unsigned long flags;
> + struct tlb_inv_context cxt;
>  
>   dsb(ishst);
>  
>   /* Switch to requested VMID */
>   kvm = kern_hyp_va(kvm);
> - __tlb_switch_to_guest()(kvm, &flags);
> + __tlb_switch_to_guest()(kvm, &cxt);
>  
>   /*
>* We could do so much better if we had the VA as well.
> @@ -129,39 +165,39 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm 
> *kvm, phys_addr_t ipa)
>   if (!has_vhe() && icache_is_vpipt())
>   __flush_icache_all();
>  
> - __tlb_switch_to_host()(kvm, flags);
> + __tlb_switch_to_host()(kvm, &cxt);
>  }
>  
>  void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
>  {
> - unsigned long flags;
> + struct tlb_inv_context cxt;
>  
>   dsb(ishst);
>  
>   /* Switch to requested VMID */
>   kvm = kern_hyp_va(kvm);
> - __tlb_switch_to_guest()(kvm, &flags);
> + __tlb_switch_to_guest()(kvm, &cxt);
>  
>   __tlbi(vmalls12e1is);
>   dsb(ish);
>   isb();
>  
> - __tlb_switch_to_host()(kvm, flags);
> + __tlb_switch_to_host()(kvm, &cxt);
>  }
>  
>  void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
>  {
>   struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
> - unsigned long flags;
> + struct tlb_inv_context cxt;
>  
>   /* Switch to requested VMID */
> - __tlb_switch_to_guest()(kvm, &flags);
> + __tlb_switch_to_guest()(kvm, &cxt);
>  
>   __tlbi(vmalle1);
>   dsb(nsh);
>   isb();
>  
> - __tlb_switch_to_host()(kvm, flags);
> + __tlb_switch_to_host()(kvm, &cxt);
>  }
>  
>  void __hyp_text __kvm_flush_vm_context(void)
> -- 
> 2.19.2
> 

Otherwise:

Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 6/8] arm64: KVM: Add synchronization on translation regime change for erratum 1165522

2018-12-10 Thread Christoffer Dall
On Thu, Dec 06, 2018 at 05:31:24PM +, Marc Zyngier wrote:
> In order to ensure that slipping HCR_EL2.TGE is done at the right
> time when switching translation regime, let insert the required ISBs
> that will be patched in when erratum 1165522 is detected.
> 
> Take this opportunity to add the missing include of asm/alternative.h
> which was getting there by pure luck.
> 
> Reviewed-by: James Morse 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/kvm_hyp.h |  8 
>  arch/arm64/kvm/hyp/switch.c  | 19 +++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index 23aca66767f9..a80a7ef57325 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -20,6 +20,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #define __hyp_text __section(.hyp.text) notrace
> @@ -163,6 +164,13 @@ static __always_inline void __hyp_text 
> __load_guest_stage2(struct kvm *kvm)
>  {
>   write_sysreg(kvm->arch.vtcr, vtcr_el2);
>   write_sysreg(kvm->arch.vttbr, vttbr_el2);
> +
> + /*
> +  * ARM erratum 1165522 requires the actual execution of the above
> +  * before we can switch to the EL1/EL0 translation regime used by
> +  * the guest.
> +  */
> + asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
>  }
>  
>  #endif /* __ARM64_KVM_HYP_H__ */
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index a8fa61c68c32..31ee0bfc432f 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -143,6 +143,14 @@ static void deactivate_traps_vhe(void)
>  {
>   extern char vectors[];  /* kernel exception vectors */
>   write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> +
> + /*
> +  * ARM erratum 1165522 requires the actual execution of the above
> +  * before we can switch to the EL2/EL0 translation regime used by
> +  * the host.
> +  */
> + asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
> +
>   write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
>   write_sysreg(vectors, vbar_el1);
>  }
> @@ -499,6 +507,17 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  
>   sysreg_save_host_state_vhe(host_ctxt);
>  
> + /*
> +  * ARM erratum 1165522 requires us to configure both stage 1 and
> +  * stage 2 translation for the guest context before we clear
> +  * HCR_EL2.TGE.
> +  *
> +  * We have already configured the guest's stage 1 translation in
> +  * kvm_vcpu_load_sysregs above.  We must now call __activate_vm
> +  * before __activate_traps, because __activate_vm configures
> +  * stage 2 translation, and __activate_traps clear HCR_EL2.TGE
> +  * (among other things).
> +  */
>   __activate_vm(vcpu->kvm);
>   __activate_traps(vcpu);
>  
> -- 
> 2.19.2
> 

Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 3/8] arm64: KVM: Install stage-2 translation before enabling traps

2018-12-10 Thread Christoffer Dall
On Thu, Dec 06, 2018 at 05:31:21PM +, Marc Zyngier wrote:
> It is a bit odd that we only install stage-2 translation after having
> cleared HCR_EL2.TGE, which means that there is a window during which
> AT requests could fail as stage-2 is not configured yet.
> 
> Let's move stage-2 configuration before we clear TGE, making the
> guest entry sequence clearer: we first configure all the guest stuff,
> then only switch to the guest translation regime.
> 
> While we're at it, do the same thing for !VHE. It doesn't hurt,
> and keeps things symmetric.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/switch.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 7cc175c88a37..a8fa61c68c32 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -499,8 +499,8 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  
>   sysreg_save_host_state_vhe(host_ctxt);
>  
> - __activate_traps(vcpu);
>   __activate_vm(vcpu->kvm);
> + __activate_traps(vcpu);
>  
>   sysreg_restore_guest_state_vhe(guest_ctxt);
>   __debug_switch_to_guest(vcpu);
> @@ -545,8 +545,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  
>   __sysreg_save_state_nvhe(host_ctxt);
>  
> - __activate_traps(vcpu);
>   __activate_vm(kern_hyp_va(vcpu->kvm));
> + __activate_traps(vcpu);
>  
>   __hyp_vgic_restore_state(vcpu);
>   __timer_enable_traps(vcpu);
> -- 
> 2.19.2
> 

Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/8] KVM: arm64: Rework detection of SVE, !VHE systems

2018-12-10 Thread Christoffer Dall
On Thu, Dec 06, 2018 at 05:31:20PM +, Marc Zyngier wrote:
> An SVE system is so far the only case where we mandate VHE. As we're
> starting to grow this requirements, let's slightly rework the way we
> deal with that situation, allowing for easy extension of this check.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm/include/asm/kvm_host.h   | 2 +-
>  arch/arm64/include/asm/kvm_host.h | 6 +++---
>  virt/kvm/arm/arm.c| 8 
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 5ca5d9af0c26..2184d9ddb418 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -285,7 +285,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  
> -static inline bool kvm_arch_check_sve_has_vhe(void) { return true; }
> +static inline bool kvm_arch_requires_vhe(void) { return false; }
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 52fbc823ff8c..d6d9aa76a943 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -422,7 +422,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
> pgd_ptr,
>   }
>  }
>  
> -static inline bool kvm_arch_check_sve_has_vhe(void)
> +static inline bool kvm_arch_requires_vhe(void)
>  {
>   /*
>* The Arm architecture specifies that implementation of SVE
> @@ -430,9 +430,9 @@ static inline bool kvm_arch_check_sve_has_vhe(void)
>* relies on this when SVE is present:
>*/
>   if (system_supports_sve())
> - return has_vhe();
> - else
>   return true;
> +
> + return false;
>  }
>  
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 23774970c9df..1db4c15edcdd 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1640,8 +1640,10 @@ int kvm_arch_init(void *opaque)
>   return -ENODEV;
>   }
>  
> - if (!kvm_arch_check_sve_has_vhe()) {
> - kvm_pr_unimpl("SVE system without VHE unsupported.  Broken 
> cpu?");
> + in_hyp_mode = is_kernel_in_hyp_mode();
> +
> + if (!in_hyp_mode && kvm_arch_requires_vhe()) {
> + kvm_pr_unimpl("CPU requiring VHE was booted in non-VHE mode");

nit: The error message feels weird to me (are we reporting CPU bugs?)
and I'm not sure about the unimpl and I think there's a linse space
missing.

How about:

kvm_err("Cannot support this CPU in non-VHE mode, not initializing\n");

If we wanted to be super helpful, we could expand
kvm_arch_requires_vhe() with a print statement saying:

kvm_err("SVE detected, requiring VHE mode\n");

But thay may be overkill.


>   return -ENODEV;
>   }
>  
> @@ -1657,8 +1659,6 @@ int kvm_arch_init(void *opaque)
>   if (err)
>   return err;
>  
> - in_hyp_mode = is_kernel_in_hyp_mode();
> -
>   if (!in_hyp_mode) {
>   err = init_hyp_mode();
>   if (err)
> -- 
> 2.19.2
> 

Otherwise:

Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 1/8] arm64: KVM: Make VHE Stage-2 TLB invalidation operations non-interruptible

2018-12-10 Thread Christoffer Dall
On Thu, Dec 06, 2018 at 05:31:19PM +, Marc Zyngier wrote:
> Contrary to the non-VHE version of the TLB invalidation helpers, the VHE
> code  has interrupts enabled, meaning that we can take an interrupt in
> the middle of such a sequence, and start running something else with
> HCR_EL2.TGE cleared.

Do we have to clear TGE to perform the TLB invalidation, or is that just
a side-effect of re-using code?

Also, do we generally perform TLB invalidations in the kernel with
interrupts disabled, or is this just a result of clearing TGE?

Somehow I feel like this should look more like just another TLB
invalidation in the kernel, but if there's a good reason why it can't
then this is fine.

Thanks,

Christoffer

> 
> That's really not a good idea.
> 
> Take the heavy-handed option and disable interrupts in
> __tlb_switch_to_guest_vhe, restoring them in __tlb_switch_to_host_vhe.
> The latter also gain an ISB in order to make sure that TGE really has
> taken effect.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/tlb.c | 35 +--
>  1 file changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 4dbd9c69a96d..7fcc9c1a5f45 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -15,14 +15,19 @@
>   * along with this program.  If not, see .
>   */
>  
> +#include 
> +
>  #include 
>  #include 
>  #include 
>  
> -static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
> +static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
> +  unsigned long *flags)
>  {
>   u64 val;
>  
> + local_irq_save(*flags);
> +
>   /*
>* With VHE enabled, we have HCR_EL2.{E2H,TGE} = {1,1}, and
>* most TLB operations target EL2/EL0. In order to affect the
> @@ -37,7 +42,8 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm 
> *kvm)
>   isb();
>  }
>  
> -static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm)
> +static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm,
> +   unsigned long *flags)
>  {
>   __load_guest_stage2(kvm);
>   isb();
> @@ -48,7 +54,8 @@ static hyp_alternate_select(__tlb_switch_to_guest,
>   __tlb_switch_to_guest_vhe,
>   ARM64_HAS_VIRT_HOST_EXTN);
>  
> -static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm)
> +static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
> + unsigned long flags)
>  {
>   /*
>* We're done with the TLB operation, let's restore the host's
> @@ -56,9 +63,12 @@ static void __hyp_text __tlb_switch_to_host_vhe(struct kvm 
> *kvm)
>*/
>   write_sysreg(0, vttbr_el2);
>   write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> + isb();
> + local_irq_restore(flags);
>  }
>  
> -static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm)
> +static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
> +  unsigned long flags)
>  {
>   write_sysreg(0, vttbr_el2);
>  }
> @@ -70,11 +80,13 @@ static hyp_alternate_select(__tlb_switch_to_host,
>  
>  void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  {
> + unsigned long flags;
> +
>   dsb(ishst);
>  
>   /* Switch to requested VMID */
>   kvm = kern_hyp_va(kvm);
> - __tlb_switch_to_guest()(kvm);
> + __tlb_switch_to_guest()(kvm, &flags);
>  
>   /*
>* We could do so much better if we had the VA as well.
> @@ -117,36 +129,39 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm 
> *kvm, phys_addr_t ipa)
>   if (!has_vhe() && icache_is_vpipt())
>   __flush_icache_all();
>  
> - __tlb_switch_to_host()(kvm);
> + __tlb_switch_to_host()(kvm, flags);
>  }
>  
>  void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
>  {
> + unsigned long flags;
> +
>   dsb(ishst);
>  
>   /* Switch to requested VMID */
>   kvm = kern_hyp_va(kvm);
> - __tlb_switch_to_guest()(kvm);
> + __tlb_switch_to_guest()(kvm, &flags);
>  
>   __tlbi(vmalls12e1is);
>   dsb(ish);
>   isb();
>  
> - __tlb_switch_to_host()(kvm);
> + __tlb_switch_to_host()(kvm, flags);
>  }
>  
>  void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
>  {
>   struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
> + unsigned long flags;
>  
>   /* Switch to requested VMID */
> - __tlb_switch_to_guest()(kvm);
> + __tlb_switch_to_guest()(kvm, &flags);
>  
>   __tlbi(vmalle1);
>   dsb(nsh);
>   isb();
>  
> - __tlb_switch_to_host()(kvm);
> + __tlb_switch_to_host()(kvm, flags);
>  }
>  
>  void __hyp_text __kvm_flush_vm_context(void)
> -- 
> 2.19.2
> 
___

Re: [PATCH v9 5/8] KVM: arm64: Support PUD hugepage in stage2_is_exec()

2018-12-10 Thread Christoffer Dall
On Wed, Dec 05, 2018 at 05:57:51PM +, Suzuki K Poulose wrote:
> 
> 
> On 01/11/2018 13:38, Christoffer Dall wrote:
> >On Wed, Oct 31, 2018 at 05:57:42PM +, Punit Agrawal wrote:
> >>In preparation for creating PUD hugepages at stage 2, add support for
> >>detecting execute permissions on PUD page table entries. Faults due to
> >>lack of execute permissions on page table entries is used to perform
> >>i-cache invalidation on first execute.
> >>
> >>Provide trivial implementations of arm32 helpers to allow sharing of
> >>code.
> >>
> >>Signed-off-by: Punit Agrawal 
> >>Reviewed-by: Suzuki K Poulose 
> >>Cc: Christoffer Dall 
> >>Cc: Marc Zyngier 
> >>Cc: Russell King 
> >>Cc: Catalin Marinas 
> >>Cc: Will Deacon 
> >>---
> >>  arch/arm/include/asm/kvm_mmu.h |  6 +++
> >>  arch/arm64/include/asm/kvm_mmu.h   |  5 +++
> >>  arch/arm64/include/asm/pgtable-hwdef.h |  2 +
> >>  virt/kvm/arm/mmu.c | 53 +++---
> >>  4 files changed, 61 insertions(+), 5 deletions(-)
> >>
> >>diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >>index 37bf85d39607..839a619873d3 100644
> >>--- a/arch/arm/include/asm/kvm_mmu.h
> >>+++ b/arch/arm/include/asm/kvm_mmu.h
> >>@@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
> >>return false;
> >>  }
> >>+static inline bool kvm_s2pud_exec(pud_t *pud)
> >>+{
> >>+   BUG();
> >
> >nit: I think this should be WARN() now :)
> >
> >>+   return false;
> >>+}
> >>+
> >>  static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> >>  {
> >>pte_val(pte) |= L_PTE_S2_RDWR;
> >>diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> >>b/arch/arm64/include/asm/kvm_mmu.h
> >>index 8da6d1b2a196..c755b37b3f92 100644
> >>--- a/arch/arm64/include/asm/kvm_mmu.h
> >>+++ b/arch/arm64/include/asm/kvm_mmu.h
> >>@@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp)
> >>return kvm_s2pte_readonly((pte_t *)pudp);
> >>  }
> >>+static inline bool kvm_s2pud_exec(pud_t *pudp)
> >>+{
> >>+   return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
> >>+}
> >>+
> >>  #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
> >>  #ifdef __PAGETABLE_PMD_FOLDED
> >>diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
> >>b/arch/arm64/include/asm/pgtable-hwdef.h
> >>index 1d7d8da2ef9b..336e24cddc87 100644
> >>--- a/arch/arm64/include/asm/pgtable-hwdef.h
> >>+++ b/arch/arm64/include/asm/pgtable-hwdef.h
> >>@@ -193,6 +193,8 @@
> >>  #define PMD_S2_RDWR   (_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
> >>  #define PMD_S2_XN (_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
> >>+#define PUD_S2_XN  (_AT(pudval_t, 2) << 53)  /* XN[1:0] */
> >>+
> >>  /*
> >>   * Memory Attribute override for Stage-2 (MemAttr[3:0])
> >>   */
> >>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >>index 1c669c3c1208..8e44dccd1b47 100644
> >>--- a/virt/kvm/arm/mmu.c
> >>+++ b/virt/kvm/arm/mmu.c
> >>@@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, 
> >>struct kvm_mmu_memory_cache
> >>return 0;
> >>  }
> >>-static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> >>+/*
> >>+ * stage2_get_leaf_entry - walk the stage2 VM page tables and return
> >>+ * true if a valid and present leaf-entry is found. A pointer to the
> >>+ * leaf-entry is returned in the appropriate level variable - pudpp,
> >>+ * pmdpp, ptepp.
> >>+ */
> >>+static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
> >>+ pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
> >
> >Do we need this type madness or could this just return a u64 pointer
> >(NULL if nothing is found) and pass that to kvm_s2pte_exec (because we
> >know it's the same bit we need to check regardless of the pgtable level
> >on both arm and arm64)?
> >
> >Or do we consider that bad for some reason?
> 
> Practically, yes the bit positions are same and thus we should be able
> to do this assuming that it is just a pte. When we get to independent stage2
> pgtable implementation which treats all page table entries as a single type
> with a level information, we should be able to get rid of these.
> But since we have followed the Linux way of page-table manipulation where we
> have "level" specific accessors. The other option is open code the walking
> sequence from the pgd to the leaf entry everywhere.
> 
> I am fine with changing this code, if you like.
> 

Meh, it just looked a bit over-engineered to me when I originally looked
at the patches, but you're right, they align with the rest of the
implementation.

Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 3/8] KVM: arm/arm64: Introduce helpers to manipulate page table entries

2018-12-10 Thread Christoffer Dall
On Mon, Dec 03, 2018 at 07:20:08PM +0530, Anshuman Khandual wrote:
> 
> 
> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> > Introduce helpers to abstract architectural handling of the conversion
> > of pfn to page table entries and marking a PMD page table entry as a
> > block entry.
> 
> Why is this necessary ? we would still need to access PMD, PUD as is
> without any conversion. IOW KVM knows the breakdown of the page table
> at various levels. Is this something required from generic KVM code ?
>   
> > 
> > The helpers are introduced in preparation for supporting PUD hugepages
> > at stage 2 - which are supported on arm64 but do not exist on arm.
> 
> Some of these patches (including the earlier two) are good on it's
> own. Do we have still mention in commit message about the incoming PUD
> enablement as the reason for these cleanup patches ?
> 

Does it hurt?  What is your concern here?


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault

2018-12-10 Thread Christoffer Dall
On Wed, Dec 05, 2018 at 10:47:10AM +, Suzuki K Poulose wrote:
> 
> 
> On 03/12/2018 13:32, Anshuman Khandual wrote:
> >
> >
> >On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> >>Stage 2 fault handler marks a page as executable if it is handling an
> >>execution fault or if it was a permission fault in which case the
> >>executable bit needs to be preserved.
> >>
> >>The logic to decide if the page should be marked executable is
> >>duplicated for PMD and PTE entries. To avoid creating another copy
> >>when support for PUD hugepages is introduced refactor the code to
> >>share the checks needed to mark a page table entry as executable.
> >>
> >>Signed-off-by: Punit Agrawal 
> >>Reviewed-by: Suzuki K Poulose 
> >>Cc: Christoffer Dall 
> >>Cc: Marc Zyngier 
> >>---
> >>  virt/kvm/arm/mmu.c | 28 +++-
> >>  1 file changed, 15 insertions(+), 13 deletions(-)
> >>
> >>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >>index 59595207c5e1..6912529946fb 100644
> >>--- a/virt/kvm/arm/mmu.c
> >>+++ b/virt/kvm/arm/mmu.c
> >>@@ -1475,7 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>phys_addr_t fault_ipa,
> >>  unsigned long fault_status)
> >>  {
> >>int ret;
> >>-   bool write_fault, exec_fault, writable, force_pte = false;
> >>+   bool write_fault, writable, force_pte = false;
> >>+   bool exec_fault, needs_exec;
> >
> >New line not required, still within 80 characters.
> >
> >>unsigned long mmu_seq;
> >>gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> >>struct kvm *kvm = vcpu->kvm;
> >>@@ -1598,19 +1599,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>phys_addr_t fault_ipa,
> >>if (exec_fault)
> >>invalidate_icache_guest_page(pfn, vma_pagesize);
> >>+   /*
> >>+* If we took an execution fault we have made the
> >>+* icache/dcache coherent above and should now let the s2
> >
> >Coherent or invalidated with invalidate_icache_guest_page ?
> 
> We also do clean_dcache above if needed. So that makes sure
> the data is coherent. Am I missing something here ?
> 

I think you've got it right.  We have made the icache coherent with the
data/instructions in the page by invalidating the icache.  I think the
comment is ok either way.

Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault

2018-12-10 Thread Christoffer Dall
On Mon, Dec 03, 2018 at 07:02:23PM +0530, Anshuman Khandual wrote:
> 
> 
> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> > Stage 2 fault handler marks a page as executable if it is handling an
> > execution fault or if it was a permission fault in which case the
> > executable bit needs to be preserved.
> > 
> > The logic to decide if the page should be marked executable is
> > duplicated for PMD and PTE entries. To avoid creating another copy
> > when support for PUD hugepages is introduced refactor the code to
> > share the checks needed to mark a page table entry as executable.
> > 
> > Signed-off-by: Punit Agrawal 
> > Reviewed-by: Suzuki K Poulose 
> > Cc: Christoffer Dall 
> > Cc: Marc Zyngier 
> > ---
> >  virt/kvm/arm/mmu.c | 28 +++-
> >  1 file changed, 15 insertions(+), 13 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > index 59595207c5e1..6912529946fb 100644
> > --- a/virt/kvm/arm/mmu.c
> > +++ b/virt/kvm/arm/mmu.c
> > @@ -1475,7 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> > phys_addr_t fault_ipa,
> >   unsigned long fault_status)
> >  {
> > int ret;
> > -   bool write_fault, exec_fault, writable, force_pte = false;
> > +   bool write_fault, writable, force_pte = false;
> > +   bool exec_fault, needs_exec;
> 
> New line not required, still within 80 characters.
> 

He's trying to logically group the two variables.  I don't see a problem
with that.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()

2018-12-10 Thread Christoffer Dall
On Mon, Dec 03, 2018 at 01:37:37PM +, Suzuki K Poulose wrote:
> Hi Anshuman,
> 
> On 03/12/2018 12:11, Anshuman Khandual wrote:
> >
> >
> >On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> >>The code for operations such as marking the pfn as dirty, and
> >>dcache/icache maintenance during stage 2 fault handling is duplicated
> >>between normal pages and PMD hugepages.
> >>
> >>Instead of creating another copy of the operations when we introduce
> >>PUD hugepages, let's share them across the different pagesizes.
> >>
> >>Signed-off-by: Punit Agrawal 
> >>Reviewed-by: Suzuki K Poulose 
> >>Cc: Christoffer Dall 
> >>Cc: Marc Zyngier 
> >>---
> >>  virt/kvm/arm/mmu.c | 49 --
> >>  1 file changed, 30 insertions(+), 19 deletions(-)
> >>
> >>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >>index 5eca48bdb1a6..59595207c5e1 100644
> >>--- a/virt/kvm/arm/mmu.c
> >>+++ b/virt/kvm/arm/mmu.c
> >>@@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>phys_addr_t fault_ipa,
> >>  unsigned long fault_status)
> >>  {
> >>int ret;
> >>-   bool write_fault, exec_fault, writable, hugetlb = false, force_pte = 
> >>false;
> >>+   bool write_fault, exec_fault, writable, force_pte = false;
> >>unsigned long mmu_seq;
> >>gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> >>struct kvm *kvm = vcpu->kvm;
> >>@@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>phys_addr_t fault_ipa,
> >>kvm_pfn_t pfn;
> >>pgprot_t mem_type = PAGE_S2;
> >>bool logging_active = memslot_is_logging(memslot);
> >>-   unsigned long flags = 0;
> >>+   unsigned long vma_pagesize, flags = 0;
> >
> >A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.
> 
> May be we could call it mapsize. pagesize is confusing.
> 

I'm ok with mapsize.  I see the vma_pagesize name coming from the fact
that this is initially set to the return value from vma_kernel_pagesize.

I have not problems with either vma_pagesize or mapsize.

> >
> >>write_fault = kvm_is_write_fault(vcpu);
> >>exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> >>@@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>phys_addr_t fault_ipa,
> >>return -EFAULT;
> >>}
> >>-   if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
> >>-   hugetlb = true;
> >>+   vma_pagesize = vma_kernel_pagesize(vma);
> >>+   if (vma_pagesize == PMD_SIZE && !logging_active) {
> >>gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> >>} else {
> >>+   /*
> >>+* Fallback to PTE if it's not one of the Stage 2
> >>+* supported hugepage sizes
> >>+*/
> >>+   vma_pagesize = PAGE_SIZE;
> >
> >This seems redundant and should be dropped. vma_kernel_pagesize() here either
> >calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply returns
> >PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA is
> >backed either by HugeTLB pages or simply normal pages. vma_pagesize would
> >either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence if
> >its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again.
> 
> We may want to force using the PTE mappings when logging_active (e.g, 
> migration
> ?) to prevent keep tracking of huge pages. So the check is still valid.
> 
> 

Agreed, and let's not try additionally change the logic and flow with
this patch.

> >
> >>+
> >>/*
> >> * Pages belonging to memslots that don't have the same
> >> * alignment for userspace and IPA cannot be mapped using
> >>@@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >>phys_addr_t fault_ipa,
> >>if (mmu_notifier_retry(kvm, mmu_seq))
> >>goto out_unlock;
> >>-   if (!hugetlb && !force_pte)
> >>-   hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> >>+   if (vma_pagesize == PAGE_SIZE && !force_pte) {
> >>+   /*
> >>+* Only PMD_SIZE transparent hugepages(THP) are
> >>+* currently supported. Th

Re: [PATCH] kvm: arm/arm64 : fix vm's hanging at startup time

2018-11-23 Thread Christoffer Dall
On Fri, Nov 23, 2018 at 02:01:56PM +0800, peng.h...@zte.com.cn wrote:
> >Hi,
> >
> >On Wed, Nov 21, 2018 at 04:56:54PM +0800, peng.h...@zte.com.cn wrote:
> >> >On 19/11/2018 09:10, Mark Rutland wrote:
> >> >> On Sat, Nov 17, 2018 at 10:58:37AM +0800, peng.h...@zte.com.cn wrote:
> >>  On 16/11/18 00:23, peng.h...@zte.com.cn wrote:
> >> >> Hi,
> >> >>> When virtual machine starts, hang up.
> >> >>
> >> >> I take it you mean the *guest* hangs? Because it doesn't get a timer
> >> >> interrupt?
> >> >>
> >> >>> The kernel version of guest
> >> >>> is 4.16. Host support vgic_v3.
> >> >>
> >> >> Your host kernel is something recent, I guess?
> >> >>
> >> >>> It was mainly due to the incorrect vgic_irq's(intid=27) group value
> >> >>> during injection interruption. when kvm_vgic_vcpu_init is called,
> >> >>> dist is not initialized at this time. Unable to get vgic V3 or V2
> >> >>> correctly, so group is not set.
> >> >>
> >> >> Mmh, that shouldn't happen with (v)GICv3. Do you use QEMU (which
> >> >> version?) or some other userland tool?
> >> >>
> >> >
> >> > QEMU emulator version 3.0.50 .
> >> >
> >> >>> group is setted to 1 when vgic_mmio_write_group is invoked at some
> >> >>> time.
> >> >>> when irq->group=0 (intid=27), No ICH_LR_GROUP flag was set and
> >> >>> interrupt injection failed.
> >> >>>
> >> >>> Signed-off-by: Peng Hao 
> >> >>> ---
> >> >>>   virt/kvm/arm/vgic/vgic-v3.c | 2 +-
> >> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >> >>>
> >> >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c 
> >> >>> b/virt/kvm/arm/vgic/vgic-v3.c
> >> >>> index 9c0dd23..d101000 100644
> >> >>> --- a/virt/kvm/arm/vgic/vgic-v3.c
> >> >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> >> >>> @@ -198,7 +198,7 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> >> >>> struct vgic_irq *irq, int lr) if (vgic_irq_is_mapped_level(irq) &&
> >> >>> (val & ICH_LR_PENDING_BIT)) irq->line_level = false;
> >> >>>
> >> >>> -if (irq->group)
> >> >>> +if (model == KVM_DEV_TYPE_ARM_VGIC_V3)
> >> >>
> >> >> This is not the right fix, not only because it basically reverts the
> >> >> GICv3 part of 87322099052 (KVM: arm/arm64: vgic: Signal IRQs using
> >> >> their configured group).
> >> >>
> >> >> Can you try to work out why kvm_vgic_vcpu_init() is apparently 
> >> >> called
> >> >> before dist->vgic_model is set, also what value it has?
> >> >> If I understand the code correctly, that shouldn't happen for a 
> >> >> GICv3.
> >> >>
> >> > Even if the value of  group is correctly assigned in 
> >> > kvm_vgic_vcpu_init, the group is then written 0 through 
> >> > >vgic_mmio_write_group.
> >> >   If the interrupt comes at this time, the interrupt injection fails.
> >> 
> >>  Does that mean that the guest is configuring its interrupts as Group0?
> >>  That sounds wrong, Linux should configure all it's interrupts as
> >>  non-secure group1.
> >> >>>
> >> >>> no, I think that uefi dose this, not linux.
> >> >>> 1. kvm_vgic_vcpu_init
> >> >>> 2. vgic_create
> >> >>> 3. kvm_vgic_dist_init
> >> >>> 4.vgic_mmio_write_group: uefi as guest, write group=0
> >> >>> 5.vgic_mmio_write_group: linux as guest, write group=1
> >> >>
> >> >> Is this the same issue fixed by EDK2 commit:
> >> >>
> >> >> 66127011a544b90e ("ArmPkg/ArmGicDxe ARM: fix encoding for GICv3 
> >> >> interrupt acknowledge")
> >> >>
> >> >> ... where EDK2 would try to use IAR0 rather than IAR1?
> >> >>
> >> >> The commit messages notes this lead to a boot-time hang.
> >> >
> >> >I managed to trigger an issue with a really old EFI implementation that
> >> >doesn't configure its interrupts as Group1, and yet tries to ACK its
> >> >interrupts using the Group1 accessor. Guess what? It is not going to work.
> >> >
> >> >Commit c7fefb690661f2e38afcb8200bd318ecf38ab961 in the edk2 tree seems
> >> >to be the fix (I only assume it does, I haven't actually checked). A
> >> >recent build, as found in Debian Buster, works perfectly (tested with
> >> >both QEMU v2.12 and tip of tree).
> >> >
> >> >Now, I really don't get what you're saying about Linux not getting
> >> >interrupts. How do you get to booting Linux if EFI is not making any
> >> >forward progress? Are you trying them independently?
> >> >
> >> I start linux with bypassing uefi, the print info is the same.
> >> [507107.748908]  vgic_mmio_write_group:## intid/27 group=0
> >> [507107.752185]  vgic_mmio_write_group:## intid/27 group=0
> >> [507107.899566]  vgic_mmio_write_group:## intid/27 group=1
> >> [507107.907370]  vgic_mmio_write_group:## intid/27 group=1
> >> the command line is like this:
> >> /home/qemu-patch/linshi/qemu/aarch64-softmmu/qemu-system-aarch64  -machine 
> >> virt-3.1,accel=kvm,usb=off,dump-guest-core=off,gic-version=3  -kernel 
> >> /home/kernelboot/vmlinuz-4.16.0+ -init

Re: [PATCH 4/4] arm64: KVM: Implement workaround for Cortex-A76 erratum 1165522

2018-11-23 Thread Christoffer Dall
On Thu, Nov 22, 2018 at 04:13:16PM +, Marc Zyngier wrote:
> On 08/11/2018 20:10, Christoffer Dall wrote:
> > On Thu, Nov 08, 2018 at 06:05:55PM +, Marc Zyngier wrote:
> >> On 06/11/18 08:15, Christoffer Dall wrote:
> >>> On Mon, Nov 05, 2018 at 02:36:16PM +, Marc Zyngier wrote:
> >>>> Early versions of Cortex-A76 can end-up with corrupt TLBs if they
> >>>> speculate an AT instruction in during a guest switch while the
> >>>> S1/S2 system registers are in an inconsistent state.
> >>>>
> >>>> Work around it by:
> >>>> - Mandating VHE
> >>>> - Make sure that S1 and S2 system registers are consistent before
> >>>>   clearing HCR_EL2.TGE, which allows AT to target the EL1 translation
> >>>>   regime
> >>>>
> >>>> These two things together ensure that we cannot hit this erratum.
> >>>>
> >>>> Signed-off-by: Marc Zyngier 
> >>>> ---
> >>>>  Documentation/arm64/silicon-errata.txt |  1 +
> >>>>  arch/arm64/Kconfig | 12 
> >>>>  arch/arm64/include/asm/cpucaps.h   |  3 ++-
> >>>>  arch/arm64/include/asm/kvm_host.h  |  3 +++
> >>>>  arch/arm64/include/asm/kvm_hyp.h   |  6 ++
> >>>>  arch/arm64/kernel/cpu_errata.c |  8 
> >>>>  arch/arm64/kvm/hyp/switch.c| 14 ++
> >>>>  7 files changed, 46 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/Documentation/arm64/silicon-errata.txt 
> >>>> b/Documentation/arm64/silicon-errata.txt
> >>>> index 76ccded8b74c..04f0bc4690c6 100644
> >>>> --- a/Documentation/arm64/silicon-errata.txt
> >>>> +++ b/Documentation/arm64/silicon-errata.txt
> >>>> @@ -57,6 +57,7 @@ stable kernels.
> >>>>  | ARM| Cortex-A73  | #858921 | 
> >>>> ARM64_ERRATUM_858921|
> >>>>  | ARM| Cortex-A55  | #1024718| 
> >>>> ARM64_ERRATUM_1024718   |
> >>>>  | ARM| Cortex-A76  | #1188873| 
> >>>> ARM64_ERRATUM_1188873   |
> >>>> +| ARM| Cortex-A76  | #1165522| 
> >>>> ARM64_ERRATUM_1165522   |
> >>>>  | ARM| MMU-500 | #841119,#826419 | N/A  
> >>>>|
> >>>>  || | |  
> >>>>|
> >>>>  | Cavium | ThunderX ITS| #22375, #24313  | 
> >>>> CAVIUM_ERRATUM_22375|
> >>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >>>> index 787d7850e064..a68bc6cc2167 100644
> >>>> --- a/arch/arm64/Kconfig
> >>>> +++ b/arch/arm64/Kconfig
> >>>> @@ -497,6 +497,18 @@ config ARM64_ERRATUM_1188873
> >>>>  
> >>>>If unsure, say Y.
> >>>>  
> >>>> +config ARM64_ERRATUM_1165522
> >>>> +bool "Cortex-A76: Speculative AT instruction using 
> >>>> out-of-context translation regime could cause subsequent request to 
> >>>> generate an incorrect translation"
> >>>> +default y
> >>>> +help
> >>>> +  This option adds work arounds for ARM Cortex-A76 erratum 
> >>>> 1165522
> >>>> +
> >>>> +  Affected Cortex-A76 cores (r0p0, r1p0, r2p0) could end-up with
> >>>> +  corrupted TLBs by speculating an AT instruction during a guest
> >>>> +  context switch.
> >>>> +
> >>>> +  If unsure, say Y.
> >>>> +
> >>>>  config CAVIUM_ERRATUM_22375
> >>>>  bool "Cavium erratum 22375, 24313"
> >>>>  default y
> >>>> diff --git a/arch/arm64/include/asm/cpucaps.h 
> >>>> b/arch/arm64/include/asm/cpucaps.h
> >>>> index 6e2d254c09eb..62d8cd15fdf2 100644
> >>>> --- a/arch/arm64/include/asm/cpucaps.h
> >>>> +++ b/arch/arm64/include/asm/cpucaps.h
> >>>> @@ -54,7 +54,8 @@
> >>>>  #define ARM64_HAS_CRC32 33
> >>>>  #define ARM64_SSBS  34
> >>>>  #define ARM64_WORKAROUND_1188873

Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Christoffer Dall
On Thu, Nov 22, 2018 at 01:32:37PM +0100, Dave P Martin wrote:
> On Thu, Nov 22, 2018 at 11:27:53AM +, Alex Bennée wrote:
> > 
> > Christoffer Dall  writes:
> > 
> > > [Adding Peter and Alex for their view on the QEMU side]
> > >
> > > On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
> > >> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote:
> > >> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote:
> > >> > > KVM_GET_REG_LIST should only enumerate registers that are actually
> > >> > > accessible, so it is necessary to filter out any register that is
> > >> > > not exposed to the guest.  For features that are configured at
> > >> > > runtime, this will require a dynamic check.
> > >> > >
> > >> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden
> > >> > > if SVE is not enabled for the guest.
> > >> >
> > >> > This implies that userspace can never access this interface for a vcpu
> > >> > before having decided whether such features are enabled for the guest 
> > >> > or
> > >> > not, since otherwise userspace will see different states for a VCPU
> > >> > depending on sequencing of the API, which sounds fragile to me.
> > >> >
> > >> > That should probably be documented somewhere, and I hope the
> > >> > enable/disable API for SVE in guests already takes that into account.
> > >> >
> > >> > Not sure if there's an action to take here, but it was the best place I
> > >> > could raise this concern.
> > >>
> > >> Fair point.  I struggled to come up with something better that solves
> > >> all problems.
> > >>
> > >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
> > >> creating the vcpu, so that if issued at all for a vcpu, it is issued
> > >> very soon after KVM_VCPU_INIT.
> > >>
> > >> I think this worked OK with the current structure of kvmtool and I
> > >> seem to remember discussing this with Peter Maydell re qemu -- but
> > >> it sounds like I should double-check.
> > >
> > > QEMU does some thing around enumerating all the system registers exposed
> > > by KVM and saving/restoring them as part of its startup, but I don't
> > > remember the exact sequence.
> > 
> > QEMU does this for each vCPU as part of it's start-up sequence:
> > 
> >   kvm_init_vcpu
> > kvm_get_cpu (-> KVM_CREATE_VCPU)
> > KVM_GET_VCPU_MMAP_SIZE
> > kvm_arch_init_vcpu
> >   kvm_arm_vcpu_init (-> KVM_ARM_VCPU_INIT)
> >   kvm_get_one_reg(ARM_CPU_ID_MPIDR)
> >   kvm_arm_init_debug (chk for KVM_CAP 
> > SET_GUEST_DEBUG/GUEST_DEBUG_HW_WPS/BPS)
> >   kvm_arm_init_serror_injection (chk KVM_CAP_ARM_INJECT_SERROR_ESR)
> >   kvm_arm_init_cpreg_list (KVM_GET_REG_LIST)
> > 
> > At this point we have the register list we need for
> > kvm_arch_get_registers which is what we call every time we want to
> > synchronise state. We only really do this for debug events, crashes and
> > at some point when migrating.
> 
> So we would need to insert KVM_ARM_SVE_CONFIG_SET into this sequence,
> meaning that the new capability is not strictly necessary.
> 
> I sympathise with Christoffer's view though that without the capability
> mechanism it may be too easy for software to make mistakes: code
> refactoring might swap the KVM_GET_REG_LIST and KVM_ARM_SVE_CONFIG ioctls
> over and then things would go wrong with no immediate error indication.
> 
> In effect, the SVE regs would be missing from the list yielded by
> KVM_GET_REG_LIST, possibly leading to silent migration failures.
> 
> I'm a bit uneasy about that.  Am I being too paranoid now?
> 

No, we've made decisions in the past where we didn't enforce ordering
which ended up being a huge pain (vgic lazy init, as a clear example of
something really bad).  Of course, it's a tradeoff.  If it's a huge pain
to implement, maybe things will be ok, but if it's just a read/write
capability handshake, I think it's worth doing.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Christoffer Dall
On Thu, Nov 22, 2018 at 11:13:51AM +, Peter Maydell wrote:
> On 22 November 2018 at 10:53, Christoffer Dall  
> wrote:
> > [Adding Peter and Alex for their view on the QEMU side]
> >
> > On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
> >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
> >> creating the vcpu, so that if issued at all for a vcpu, it is issued
> >> very soon after KVM_VCPU_INIT.
> >>
> >> I think this worked OK with the current structure of kvmtool and I
> >> seem to remember discussing this with Peter Maydell re qemu -- but
> >> it sounds like I should double-check.
> >
> > QEMU does some thing around enumerating all the system registers exposed
> > by KVM and saving/restoring them as part of its startup, but I don't
> > remember the exact sequence.
> 
> This all happens in kvm_arch_init_vcpu(), which does:
>  * KVM_ARM_VCPU_INIT ioctl (with the appropriate kvm_init_features set)
>  * read the guest MPIDR with GET_ONE_REG so we know what KVM
>is doing with MPIDR assignment across CPUs
>  * check for interesting extensions like KVM_CAP_SET_GUEST_DEBUG
>  * get and cache a list of what system registers the vcpu has,
>using KVM_GET_REG_LIST. This is where we do the "size must
>be U32 or U64" sanity check.
> 
> So if there's something we can't do by setting kvm_init_features
> for KVM_ARM_VCPU_INIT but have to do immediately afterwards,
> that is straightforward.
> 
> The major requirement for QEMU is that if we don't specifically
> enable SVE in the VCPU then we must not see any registers
> in the KVM_GET_REG_LIST that are not u32 or u64 -- otherwise
> QEMU will refuse to start.
> 

So on migration, will you have the required information for
KVM_ARM_VCPU_INIT before setting the registers from the migration
stream?

(I assume so, because presumably this comes from a command-line switch
or from the machine definition, which must match the source.)

Therefore, I don't think there's an issue with this patch, but from
bitter experience I think we should enforce ordering if possible.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Christoffer Dall
[Adding Peter and Alex for their view on the QEMU side]

On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote:
> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote:
> > > KVM_GET_REG_LIST should only enumerate registers that are actually
> > > accessible, so it is necessary to filter out any register that is
> > > not exposed to the guest.  For features that are configured at
> > > runtime, this will require a dynamic check.
> > > 
> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden
> > > if SVE is not enabled for the guest.
> > 
> > This implies that userspace can never access this interface for a vcpu
> > before having decided whether such features are enabled for the guest or
> > not, since otherwise userspace will see different states for a VCPU
> > depending on sequencing of the API, which sounds fragile to me.
> > 
> > That should probably be documented somewhere, and I hope the
> > enable/disable API for SVE in guests already takes that into account.
> > 
> > Not sure if there's an action to take here, but it was the best place I
> > could raise this concern.
> 
> Fair point.  I struggled to come up with something better that solves
> all problems.
> 
> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
> creating the vcpu, so that if issued at all for a vcpu, it is issued
> very soon after KVM_VCPU_INIT.
> 
> I think this worked OK with the current structure of kvmtool and I
> seem to remember discussing this with Peter Maydell re qemu -- but
> it sounds like I should double-check.

QEMU does some thing around enumerating all the system registers exposed
by KVM and saving/restoring them as part of its startup, but I don't
remember the exact sequence.

> 
> Either way, you're right, this needs to be clearly documented.
> 
> 
> If we want to be more robust, maybe we should add a capability too,
> so that userspace that enables this capability promises to call
> KVM_ARM_SVE_CONFIG_SET for each vcpu, and affected ioctls (KVM_RUN,
> KVM_GET_REG_LIST etc.) are forbidden until that is done?
> 
> That should help avoid accidents.
> 
> I could add a special meaning for an empty kvm_sve_vls, such that
> it doesn't enable SVE on the affected vcpu.  That retains the ability
> to create heterogeneous guests while still following the above flow.
> 
I think making sure that userspace can ever only see the same list of
available system regiters is going to cause us less pain going forward.

If the separate ioctl and capability check is the easiest way of doing
that, then I think that sounds good.  (I had wished we could have just
added some data to KVM_CREATE_VCPU, but that doesn't seem to be the
case.)


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: arm/arm64 : fix vm's hanging at startup time

2018-11-22 Thread Christoffer Dall
On Wed, Nov 21, 2018 at 03:53:03PM +, Julien Thierry wrote:
> 
> 
> On 21/11/18 15:24, Christoffer Dall wrote:
> >On Wed, Nov 21, 2018 at 12:17:45PM +, Julien Thierry wrote:
> >>
> >>
> >>On 21/11/18 11:06, Christoffer Dall wrote:
> >>>Hi,
> >>>
> >>>On Wed, Nov 21, 2018 at 04:56:54PM +0800, peng.h...@zte.com.cn wrote:
> >>>>>On 19/11/2018 09:10, Mark Rutland wrote:
> >>>>>>On Sat, Nov 17, 2018 at 10:58:37AM +0800, peng.h...@zte.com.cn wrote:
> >>>>>>>>On 16/11/18 00:23, peng.h...@zte.com.cn wrote:
> >>>>>>>>>>Hi,
> >>>>>>>>>>>When virtual machine starts, hang up.
> >>>>>>>>>>
> >>>>>>>>>>I take it you mean the *guest* hangs? Because it doesn't get a timer
> >>>>>>>>>>interrupt?
> >>>>>>>>>>
> >>>>>>>>>>>The kernel version of guest
> >>>>>>>>>>>is 4.16. Host support vgic_v3.
> >>>>>>>>>>
> >>>>>>>>>>Your host kernel is something recent, I guess?
> >>>>>>>>>>
> >>>>>>>>>>>It was mainly due to the incorrect vgic_irq's(intid=27) group value
> >>>>>>>>>>>during injection interruption. when kvm_vgic_vcpu_init is called,
> >>>>>>>>>>>dist is not initialized at this time. Unable to get vgic V3 or V2
> >>>>>>>>>>>correctly, so group is not set.
> >>>>>>>>>>
> >>>>>>>>>>Mmh, that shouldn't happen with (v)GICv3. Do you use QEMU (which
> >>>>>>>>>>version?) or some other userland tool?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>QEMU emulator version 3.0.50 .
> >>>>>>>>>
> >>>>>>>>>>>group is setted to 1 when vgic_mmio_write_group is invoked at some
> >>>>>>>>>>>time.
> >>>>>>>>>>>when irq->group=0 (intid=27), No ICH_LR_GROUP flag was set and
> >>>>>>>>>>>interrupt injection failed.
> >>>>>>>>>>>
> >>>>>>>>>>>Signed-off-by: Peng Hao 
> >>>>>>>>>>>---
> >>>>>>>>>>>   virt/kvm/arm/vgic/vgic-v3.c | 2 +-
> >>>>>>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>>>>>
> >>>>>>>>>>>diff --git a/virt/kvm/arm/vgic/vgic-v3.c 
> >>>>>>>>>>>b/virt/kvm/arm/vgic/vgic-v3.c
> >>>>>>>>>>>index 9c0dd23..d101000 100644
> >>>>>>>>>>>--- a/virt/kvm/arm/vgic/vgic-v3.c
> >>>>>>>>>>>+++ b/virt/kvm/arm/vgic/vgic-v3.c
> >>>>>>>>>>>@@ -198,7 +198,7 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> >>>>>>>>>>>struct vgic_irq *irq, int lr) if (vgic_irq_is_mapped_level(irq) &&
> >>>>>>>>>>>(val & ICH_LR_PENDING_BIT)) irq->line_level = false;
> >>>>>>>>>>>
> >>>>>>>>>>>-if (irq->group)
> >>>>>>>>>>>+if (model == KVM_DEV_TYPE_ARM_VGIC_V3)
> >>>>>>>>>>
> >>>>>>>>>>This is not the right fix, not only because it basically reverts the
> >>>>>>>>>>GICv3 part of 87322099052 (KVM: arm/arm64: vgic: Signal IRQs using
> >>>>>>>>>>their configured group).
> >>>>>>>>>>
> >>>>>>>>>>Can you try to work out why kvm_vgic_vcpu_init() is apparently 
> >>>>>>>>>>called
> >>>>>>>>>>before dist->vgic_model is set, also what value it has?
> >>>>>>>>>>If I understand the code correctly, that shouldn't happen for a 
> >>>>>>>>>>GICv3.
> >>>>>>>>>>
> >>>>>>>>>Even if the value of  group is correctly assigned in 
&

Re: [PATCH] kvm: arm/arm64 : fix vm's hanging at startup time

2018-11-21 Thread Christoffer Dall
On Wed, Nov 21, 2018 at 12:17:45PM +, Julien Thierry wrote:
> 
> 
> On 21/11/18 11:06, Christoffer Dall wrote:
> >Hi,
> >
> >On Wed, Nov 21, 2018 at 04:56:54PM +0800, peng.h...@zte.com.cn wrote:
> >>>On 19/11/2018 09:10, Mark Rutland wrote:
> >>>>On Sat, Nov 17, 2018 at 10:58:37AM +0800, peng.h...@zte.com.cn wrote:
> >>>>>>On 16/11/18 00:23, peng.h...@zte.com.cn wrote:
> >>>>>>>>Hi,
> >>>>>>>>>When virtual machine starts, hang up.
> >>>>>>>>
> >>>>>>>>I take it you mean the *guest* hangs? Because it doesn't get a timer
> >>>>>>>>interrupt?
> >>>>>>>>
> >>>>>>>>>The kernel version of guest
> >>>>>>>>>is 4.16. Host support vgic_v3.
> >>>>>>>>
> >>>>>>>>Your host kernel is something recent, I guess?
> >>>>>>>>
> >>>>>>>>>It was mainly due to the incorrect vgic_irq's(intid=27) group value
> >>>>>>>>>during injection interruption. when kvm_vgic_vcpu_init is called,
> >>>>>>>>>dist is not initialized at this time. Unable to get vgic V3 or V2
> >>>>>>>>>correctly, so group is not set.
> >>>>>>>>
> >>>>>>>>Mmh, that shouldn't happen with (v)GICv3. Do you use QEMU (which
> >>>>>>>>version?) or some other userland tool?
> >>>>>>>>
> >>>>>>>
> >>>>>>>QEMU emulator version 3.0.50 .
> >>>>>>>
> >>>>>>>>>group is setted to 1 when vgic_mmio_write_group is invoked at some
> >>>>>>>>>time.
> >>>>>>>>>when irq->group=0 (intid=27), No ICH_LR_GROUP flag was set and
> >>>>>>>>>interrupt injection failed.
> >>>>>>>>>
> >>>>>>>>>Signed-off-by: Peng Hao 
> >>>>>>>>>---
> >>>>>>>>>   virt/kvm/arm/vgic/vgic-v3.c | 2 +-
> >>>>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>>>
> >>>>>>>>>diff --git a/virt/kvm/arm/vgic/vgic-v3.c 
> >>>>>>>>>b/virt/kvm/arm/vgic/vgic-v3.c
> >>>>>>>>>index 9c0dd23..d101000 100644
> >>>>>>>>>--- a/virt/kvm/arm/vgic/vgic-v3.c
> >>>>>>>>>+++ b/virt/kvm/arm/vgic/vgic-v3.c
> >>>>>>>>>@@ -198,7 +198,7 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> >>>>>>>>>struct vgic_irq *irq, int lr) if (vgic_irq_is_mapped_level(irq) &&
> >>>>>>>>>(val & ICH_LR_PENDING_BIT)) irq->line_level = false;
> >>>>>>>>>
> >>>>>>>>>-if (irq->group)
> >>>>>>>>>+if (model == KVM_DEV_TYPE_ARM_VGIC_V3)
> >>>>>>>>
> >>>>>>>>This is not the right fix, not only because it basically reverts the
> >>>>>>>>GICv3 part of 87322099052 (KVM: arm/arm64: vgic: Signal IRQs using
> >>>>>>>>their configured group).
> >>>>>>>>
> >>>>>>>>Can you try to work out why kvm_vgic_vcpu_init() is apparently called
> >>>>>>>>before dist->vgic_model is set, also what value it has?
> >>>>>>>>If I understand the code correctly, that shouldn't happen for a GICv3.
> >>>>>>>>
> >>>>>>>Even if the value of  group is correctly assigned in 
> >>>>>>>kvm_vgic_vcpu_init, the group is then written 0 through 
> >>>>>>>vgic_mmio_write_group.
> >>>>>>>   If the interrupt comes at this time, the interrupt injection fails.
> >>>>>>
> >>>>>>Does that mean that the guest is configuring its interrupts as Group0?
> >>>>>>That sounds wrong, Linux should configure all it's interrupts as
> >>>>>>non-secure group1.
> >>>>>
> >>>>>no, I think that uefi dose this, not linux.
> >>>>>1. kvm_vgic_vcpu_init
> >>>>>2. vgic_create
> >>

Re: [PATCH] kvm: arm/arm64 : fix vm's hanging at startup time

2018-11-21 Thread Christoffer Dall
Hi,

On Wed, Nov 21, 2018 at 04:56:54PM +0800, peng.h...@zte.com.cn wrote:
> >On 19/11/2018 09:10, Mark Rutland wrote:
> >> On Sat, Nov 17, 2018 at 10:58:37AM +0800, peng.h...@zte.com.cn wrote:
>  On 16/11/18 00:23, peng.h...@zte.com.cn wrote:
> >> Hi,
> >>> When virtual machine starts, hang up.
> >>
> >> I take it you mean the *guest* hangs? Because it doesn't get a timer
> >> interrupt?
> >>
> >>> The kernel version of guest
> >>> is 4.16. Host support vgic_v3.
> >>
> >> Your host kernel is something recent, I guess?
> >>
> >>> It was mainly due to the incorrect vgic_irq's(intid=27) group value
> >>> during injection interruption. when kvm_vgic_vcpu_init is called,
> >>> dist is not initialized at this time. Unable to get vgic V3 or V2
> >>> correctly, so group is not set.
> >>
> >> Mmh, that shouldn't happen with (v)GICv3. Do you use QEMU (which
> >> version?) or some other userland tool?
> >>
> >
> > QEMU emulator version 3.0.50 .
> >
> >>> group is setted to 1 when vgic_mmio_write_group is invoked at some
> >>> time.
> >>> when irq->group=0 (intid=27), No ICH_LR_GROUP flag was set and
> >>> interrupt injection failed.
> >>>
> >>> Signed-off-by: Peng Hao 
> >>> ---
> >>>   virt/kvm/arm/vgic/vgic-v3.c | 2 +-
> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> >>> index 9c0dd23..d101000 100644
> >>> --- a/virt/kvm/arm/vgic/vgic-v3.c
> >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> >>> @@ -198,7 +198,7 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> >>> struct vgic_irq *irq, int lr) if (vgic_irq_is_mapped_level(irq) &&
> >>> (val & ICH_LR_PENDING_BIT)) irq->line_level = false;
> >>>
> >>> -if (irq->group)
> >>> +if (model == KVM_DEV_TYPE_ARM_VGIC_V3)
> >>
> >> This is not the right fix, not only because it basically reverts the
> >> GICv3 part of 87322099052 (KVM: arm/arm64: vgic: Signal IRQs using
> >> their configured group).
> >>
> >> Can you try to work out why kvm_vgic_vcpu_init() is apparently called
> >> before dist->vgic_model is set, also what value it has?
> >> If I understand the code correctly, that shouldn't happen for a GICv3.
> >>
> > Even if the value of  group is correctly assigned in 
> > kvm_vgic_vcpu_init, the group is then written 0 through 
> > vgic_mmio_write_group.
> >   If the interrupt comes at this time, the interrupt injection fails.
> 
>  Does that mean that the guest is configuring its interrupts as Group0?
>  That sounds wrong, Linux should configure all it's interrupts as
>  non-secure group1.
> >>>
> >>> no, I think that uefi dose this, not linux.
> >>> 1. kvm_vgic_vcpu_init
> >>> 2. vgic_create
> >>> 3. kvm_vgic_dist_init
> >>> 4.vgic_mmio_write_group: uefi as guest, write group=0
> >>> 5.vgic_mmio_write_group: linux as guest, write group=1
> >>
> >> Is this the same issue fixed by EDK2 commit:
> >>
> >> 66127011a544b90e ("ArmPkg/ArmGicDxe ARM: fix encoding for GICv3 interrupt 
> >> acknowledge")
> >>
> >> ... where EDK2 would try to use IAR0 rather than IAR1?
> >>
> >> The commit messages notes this lead to a boot-time hang.
> >
> >I managed to trigger an issue with a really old EFI implementation that
> >doesn't configure its interrupts as Group1, and yet tries to ACK its
> >interrupts using the Group1 accessor. Guess what? It is not going to work.
> >
> >Commit c7fefb690661f2e38afcb8200bd318ecf38ab961 in the edk2 tree seems
> >to be the fix (I only assume it does, I haven't actually checked). A
> >recent build, as found in Debian Buster, works perfectly (tested with
> >both QEMU v2.12 and tip of tree).
> >
> >Now, I really don't get what you're saying about Linux not getting
> >interrupts. How do you get to booting Linux if EFI is not making any
> >forward progress? Are you trying them independently?
> >
> I start linux with bypassing uefi, the print info is the same.
> [507107.748908]  vgic_mmio_write_group:## intid/27 group=0
> [507107.752185]  vgic_mmio_write_group:## intid/27 group=0
> [507107.899566]  vgic_mmio_write_group:## intid/27 group=1
> [507107.907370]  vgic_mmio_write_group:## intid/27 group=1
> the command line is like this:
> /home/qemu-patch/linshi/qemu/aarch64-softmmu/qemu-system-aarch64  -machine 
> virt-3.1,accel=kvm,usb=off,dump-guest-core=off,gic-version=3  -kernel 
> /home/kernelboot/vmlinuz-4.16.0+ -initrd 
> /home/kernelboot/initramfs-4.16.0+.img -append root=/dev/mapper/cla-root ro 
> crashkernel=auto rd.lvm.lv=cla/root rd.lvm.lv=cla/swap.UTF-8  -drive 
> file=/home/centos74-ph/boot.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>   -vnc 0.0.0.0:0 -k en-us -device 
> virtio-gpu-pci,id=video0,max_outputs=1,b

Re: [PATCH 0/4] KVM: arm/arm64: vgic: Use raw_spinlock for locks taken in IRQ context

2018-11-20 Thread Christoffer Dall
On Mon, Nov 19, 2018 at 05:07:55PM +, Julien Thierry wrote:
> While testing KVM running on PREEMPT_RT, starting guest could simply
> freeze the machine. This is because we are using spinlocks for VGIC
> locks, which is invalid in the VGIC case since the locks must be take
> with interrupts disabled.
> 
> The solution is to use raw_spinlock instead of spinlocks.
> 
> Replacing those locks also highlighted an issue where we attempt to
> cond_resched with interrupts disabled.
> 
> Patch 1 fixes the cond_resched issue.

I don't agree with this fix without seeing a more thorough analysis.

> Patch 2-4 replace the VGIC spinlocks with raw_spinlocks
> 

For these:

Acked-by: Christoffer Dall 



Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabled

2018-11-20 Thread Christoffer Dall
On Mon, Nov 19, 2018 at 05:07:56PM +, Julien Thierry wrote:
> To change the active state of an MMIO, halt is requested for all vcpus of
> the affected guest before modifying the IRQ state. This is done by calling
> cond_resched_lock() in vgic_mmio_change_active(). However interrupts are
> disabled at this point and running a vcpu cannot get rescheduled.

"running a vcpu cannot get rescheduled" ?

> 
> Solve this by waiting for all vcpus to be halted after emmiting the halt
> request.
> 
> Fixes commit 6c1b7521f4a07cc63bbe2dfe290efed47cdb780a ("KVM: arm/arm64:
> Factor out functionality to get vgic mmio requester_vcpu")
> 
> Signed-off-by: Julien Thierry 
> Suggested-by: Marc Zyngier 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: sta...@vger.kernel.org
> ---
>  virt/kvm/arm/vgic/vgic-mmio.c | 33 +++--
>  1 file changed, 11 insertions(+), 22 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index f56ff1c..eefd877 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -313,27 +313,6 @@ static void vgic_mmio_change_active(struct kvm_vcpu 
> *vcpu, struct vgic_irq *irq,
> 
>   spin_lock_irqsave(&irq->irq_lock, flags);
> 
> - /*
> -  * If this virtual IRQ was written into a list register, we
> -  * have to make sure the CPU that runs the VCPU thread has
> -  * synced back the LR state to the struct vgic_irq.
> -  *
> -  * As long as the conditions below are true, we know the VCPU thread
> -  * may be on its way back from the guest (we kicked the VCPU thread in
> -  * vgic_change_active_prepare)  and still has to sync back this IRQ,
> -  * so we release and re-acquire the spin_lock to let the other thread
> -  * sync back the IRQ.
> -  *
> -  * When accessing VGIC state from user space, requester_vcpu is
> -  * NULL, which is fine, because we guarantee that no VCPUs are running
> -  * when accessing VGIC state from user space so irq->vcpu->cpu is
> -  * always -1.
> -  */
> - while (irq->vcpu && /* IRQ may have state in an LR somewhere */
> -irq->vcpu != requester_vcpu && /* Current thread is not the VCPU 
> thread */
> -irq->vcpu->cpu != -1) /* VCPU thread is running */
> - cond_resched_lock(&irq->irq_lock);
> -
>   if (irq->hw) {
>   vgic_hw_irq_change_active(vcpu, irq, active, !requester_vcpu);
>   } else {
> @@ -368,8 +347,18 @@ static void vgic_mmio_change_active(struct kvm_vcpu 
> *vcpu, struct vgic_irq *irq,
>   */
>  static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
>  {
> - if (intid > VGIC_NR_PRIVATE_IRQS)
> + if (intid > VGIC_NR_PRIVATE_IRQS) {
> + struct kvm_vcpu *tmp;
> + int i;
> +
>   kvm_arm_halt_guest(vcpu->kvm);
> +
> + /* Wait for each vcpu to be halted */
> + kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> + while (tmp->cpu != -1)
> + cond_resched();

We used to have something like this which Andre then found out it could
deadlock the system, because the VCPU making this request wouldn't have
called kvm_arch_vcpu_put, and its cpu value would still have a value.

That's why we have the vcpu && vcpu != requester check.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 20/23] KVM: arm64: Add arch vm ioctl hook

2018-11-20 Thread Christoffer Dall
On Thu, Nov 15, 2018 at 06:04:22PM +, Dave Martin wrote:
> On Fri, Nov 02, 2018 at 09:32:27AM +0100, Christoffer Dall wrote:
> > On Fri, Sep 28, 2018 at 02:39:24PM +0100, Dave Martin wrote:
> > > To enable arm64-specific vm ioctls to be added cleanly, this patch
> > > adds a kvm_arm_arch_vm_ioctl() hook so that these don't pollute the
> > > common code.
> > 
> > Hmmm, I don't really see the strenght of that argument, and have the
> > same concern as before.  I'd like to avoid the additional indirection
> > and instead just follow the existing pattern with a dummy implementation
> > on the 32-bit side that returns an error.
> 
> So for this and the similar comment on patch 18, this was premature (or
> at least, overzealous) factoring on my part.
> 
> I'm happy to merge this back together for arm and arm64 as you prefer.
> 
> Do we have a nice way of writing the arch check, e.g.
> 
>   case KVM_ARM_SVE_CONFIG:
>   if (!IS_ENABLED(ARM64))
>   return -EINVAL;
>   else
>   return kvm_vcpu_sve_config(NULL, userp);
> 
> should work, but looks a bit strange.  Maybe I'm just being fussy.

I prefer just doing:

case KVM_ARM_SVE_CONFIG:
return kvm_vcpu_sve_config(NULL, userp);


And having this in arch/arm/include/asm/kvm_foo.h:

static inline int kvm_vcpu_sve_config(...)
{
return -EINVAL;
}


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 05/23] KVM: arm: Add arch vcpu uninit hook

2018-11-20 Thread Christoffer Dall
On Thu, Nov 15, 2018 at 04:40:31PM +, Dave Martin wrote:
> On Fri, Nov 02, 2018 at 09:05:36AM +0100, Christoffer Dall wrote:
> > On Fri, Sep 28, 2018 at 02:39:09PM +0100, Dave Martin wrote:
> > > In preparation for adding support for SVE in guests on arm64, a
> > > hook is needed for freeing additional per-vcpu memory when a vcpu
> > > is freed.
> > 
> > Can this commit motivate why we can't do the work in kvm_arch_vcpu_free,
> > which we use for freeing other data structures?
> > 
> > (Presumably, uninit is needed when you need to do something at the very
> > last step after releasing the struct pid.
> 
> It wasn't to do with that.
> 
> Rather, the division of responsibility between the vcpu_uninit and
> vcpu_free paths is not very clear.
> 
> In the earlier version of the series, I think SVE state may have been
> allocated rather early and we may have needed to free it in the failure
> path of kvm_arch_vcpu_create() (which just calls kvm_vcpu_uninit()).
> (Alternatively, I may just have been wrong.)
> 
> Now, the vcpu must be fully created before the KVM_ARM_SVE_CONFIG ioctl
> on it (which is what allocates sve_state) can succeed anyway.
> 
> So the distinction between these two teardown phases is probably no
> longer important.
> 
> I'll see whether I can get rid of this hook and free the SVE state in
> kvm_arch_vcpu_free() instead.
> 
> Does that make sense?
> 

Yes, thanks.

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/7] arm64/kvm: context-switch ptrauth registers

2018-11-13 Thread Christoffer Dall
On Mon, Nov 12, 2018 at 10:32:12PM +, Catalin Marinas wrote:
> On Fri, Nov 02, 2018 at 09:37:25AM +0100, Christoffer Dall wrote:
> > On Wed, Oct 17, 2018 at 04:17:55PM +0530, Amit Daniel Kachhap wrote:
> > > From: Mark Rutland 
> > > 
> > > When pointer authentication is supported, a guest may wish to use it.
> > > This patch adds the necessary KVM infrastructure for this to work.
> > > 
> > > When we schedule a vcpu, we enable guest usage of pointer
> > > authentication instructions and accesses to the keys. After these are
> > > enabled, we allow context-switching the keys.
> > > 
> > > Pointer authentication consists of address authentication and generic
> > > authentication, and CPUs in a system might have varied support for
> > > either. Where support for either feature is not uniform, it is hidden
> > > from guests via ID register emulation, as a result of the cpufeature
> > > framework in the host.
> > > 
> > > Unfortunately, address authentication and generic authentication cannot
> > > be trapped separately, as the architecture provides a single EL2 trap
> > > covering both. If we wish to expose one without the other, we cannot
> > > prevent a (badly-written) guest from intermittently using a feature
> > > which is not uniformly supported (when scheduled on a physical CPU which
> > > supports the relevant feature). When the guest is scheduled on a
> > > physical CPU lacking the feature, these attempts will result in an UNDEF
> > > being taken by the guest.
> > > 
> > > Signed-off-by: Mark Rutland 
> > > Signed-off-by: Amit Daniel Kachhap 
> > > Cc: Marc Zyngier 
> > > Cc: Christoffer Dall 
> > > Cc: kvmarm@lists.cs.columbia.edu
> [...] 
> > Two questions:
> > 
> >  - Can we limit all ptrauth functionality to VHE systems so that we
> >don't need to touch the non-VHE path and so that we don't need any of
> >the __hyp_text stuff?
> 
> I would say yes. ARMv8.3 implies v8.1, so can enable ptrauth only when
> VHE is built into the kernel and present in the CPU implementation.
> 

Sounds good.

> >  - Can we move all the save/restore logic to vcpu load/put as long as
> >the host kernel itself isn't using ptrauth, and if the host kernel at
> >some point begins to use ptrauth, can we have a hook to save/restore
> >at that time (similar to what we do for FPSIMD) to avoid this
> >overhead on every switch?
> 
> We will probably enable ptrauth for the kernel as well fairly soon, so I
> don't think we should base the KVM assumption on the no ptrauth in
> kernel use-case.
> 

I assume in this case ptrauth will be used for all of the kernel,
including most of the KVM code?

In that case, I wonder if we always need to context-switch ptrauth
configruation state or if we can be lazy until the guest actually uses
the feature?


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


<    1   2   3   4   5   6   7   8   9   10   >