Re: [PATCH v2 2/5] KVM: arm/arm64: Add ARM user space interrupt signaling ABI

2017-04-04 Thread Alexander Graf



On 21.02.17 12:41, Christoffer Dall wrote:

Hi Alex,

On Fri, Feb 03, 2017 at 05:51:18PM +, Peter Maydell wrote:

On 3 February 2017 at 14:56, Christoffer Dall  wrote:

From: Christoffer Dall 

We have 2 modes for dealing with interrupts in the ARM world. We can
either handle them all using hardware acceleration through the vgic or
we can emulate a gic in user space and only drive CPU IRQ pins from
there.

Unfortunately, when driving IRQs from user space, we never tell user
space about events from devices emulated inside the kernel, which may
result in interrupt line state changes, so we lose out on for example
timer and PMU events if we run with user space gic emulation.

Define an ABI to publish such device output levels to userspace.

Signed-off-by: Alexander Graf 
Signed-off-by: Christoffer Dall 



Acked-by: Peter Maydell 

for the userspace ABI/docs part.



Given Peter's ack on this ABI, any chance you could take a look at
fixing the userspace SMP issue and respinning the userspace patches for
this series?


The problem with user space SMP was simply a missing lock:

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index a66845f..1b33895 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -548,10 +548,14 @@ MemTxAttrs kvm_arch_post_run(CPUState *cs, struct 
kvm_run *run)


 /* Synchronize our internal vtimer irq line with the kvm one */
 if (run->s.regs.device_irq_level != cpu->device_irq_level) {
-qemu_set_irq(ARM_CPU(cs)->gt_timer_outputs[GTIMER_VIRT],
+qemu_mutex_lock_iothread();
+qemu_set_irq(cpu->gt_timer_outputs[GTIMER_VIRT],
  run->s.regs.device_irq_level & 
KVM_ARM_DEV_EL1_VTIMER);

+qemu_set_irq(cpu->gt_timer_outputs[GTIMER_PHYS],
+ run->s.regs.device_irq_level & 
KVM_ARM_DEV_EL1_PTIMER);

 /* TODO: Handle changes in PTIMER and PMU as well */
 cpu->device_irq_level = run->s.regs.device_irq_level;
+qemu_mutex_unlock_iothread();
 }

 return MEMTXATTRS_UNSPECIFIED;




I also wasn't able to use your series out of the box. There are several 
places in the code that check whether a timer is enabled (for register 
save/restore for example) and without vgic we never ended up setting 
that to true.


I don't know how safe it is to set it to true regardless. It feels to me 
like someone has to put more thought into how to switch from an 
initialized user space timer state to a vgic one. For reference, my test 
patch is below:


diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 891a725..56a5d51 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -629,8 +629,11 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
return 0;

/* Without a VGIC we do not map virtual IRQs to physical IRQs */
-   if (!irqchip_in_kernel(vcpu->kvm))
+   if (!irqchip_in_kernel(vcpu->kvm)) {
+   /* Enable timer for now, may be racy? */
+   timer->enabled = 1;
return 0;
+   }

if (!vgic_initialized(vcpu->kvm))
return -ENODEV;




Once we solved that piece, I'll happily respin my user space patch.


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory

2017-04-04 Thread gengdongjiu
Hi James,
   thanks for the patch, have you consider to told Qemu or KVM tools
the reason for this bus error(SEA/SEI)?

when Qemu or KVM tools get this SIGBUS signal, it do not know receive
this SIGBUS due to SEA or SEI.
OR KVM only send this SIGBUS when encounter SEA? if so, for the SEI
case, how to let Qemu simulate to generate CPER for guest OS SEI.


2017-03-16 0:07 GMT+08:00 James Morse :
> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for
> broken memory can call memory_failure() in mm/memory-failure.c to deliver
> SIGBUS to any user space process using the page, and notify all the
> in-kernel users.
>
> If the page corresponded with guest memory, KVM will unmap this page
> from its stage2 page tables. The user space process that allocated
> this memory may have never touched this page in which case it may not
> be mapped meaning SIGBUS won't be delivered.
>
> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it
> comes to process the stage2 fault.
>
> Do as x86 does, and deliver the SIGBUS when we discover
> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb
> as this matches the user space mapping size.
>
> Signed-off-by: James Morse 
> CC: gengdongjiu 
> ---
>  Without this patch both kvmtool and Qemu exit as the KVM_RUN ioctl() returns
>  EFAULT.
>  QEMU: error: kvm run failed Bad address
>  LVKM: KVM_RUN failed: Bad address
>
>  With this patch both kvmtool and Qemu receive SIGBUS ... and then exit.
>  In the future Qemu can use this signal to notify the guest, for more details
>  see hwpoison[1].
>
>  [0] https://www.spinics.net/lists/arm-kernel/msg560009.html
>  [1] 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hwpoison.txt
>
>
>  arch/arm/kvm/mmu.c | 23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616fd4ddd..9d1aa294e88f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -20,8 +20,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct kvm_vcpu 
> *vcpu, kvm_pfn_t pfn,
> __coherent_cache_guest_page(vcpu, pfn, size);
>  }
>
> +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb)
> +{
> +   siginfo_t info;
> +
> +   info.si_signo   = SIGBUS;
> +   info.si_errno   = 0;
> +   info.si_code= BUS_MCEERR_AR;
> +   info.si_addr= (void __user *)address;
> +
> +   if (hugetlb)
> +   info.si_addr_lsb = PMD_SHIFT;
> +   else
> +   info.si_addr_lsb = PAGE_SHIFT;
> +
> +   send_sig_info(SIGBUS, &info, current);
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   struct kvm_memory_slot *memslot, unsigned long hva,
>   unsigned long fault_status)
> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
> smp_rmb();
>
> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
> +   if (pfn == KVM_PFN_ERR_HWPOISON) {
> +   kvm_send_hwpoison_signal(hva, hugetlb);
> +   return 0;
> +   }
> if (is_error_noslot_pfn(pfn))
> return -EFAULT;
>
> --
> 2.10.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 21:04, Christoffer Dall wrote:
>  - (On a related work, I suddenly felt it weird that
>kvm_make_all_cpus_request() doesn't wake up sleeping VCPUs, but only
>sends an IPI; does this mean that calling this function should be
>followed by a kick() for each VCPU?  Maybe Radim was looking at this
>in his series already.)

Yes, kvm_make_all_cpus_request in x86 is only used for "non urgent"
requests, i.e. things to do before the next guest entry.

>  - In the explanation you wrote, you use the term 'we' a lot, but when
>talking about SMP barriers, I think it only makes sense to talk about
>actions and observations between multiple CPUs and we have to be
>specific about which CPU observes or does what with respect to the
>other.  Maybe I'm being a stickler here, but there something here
>which is making me uneasy.

The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
so I think it is justified to cut the ordering on the reasoning and just
focus on what the two memory locations and conditions mean.  But I'd
wait for v3, since I'm sure that Drew also understands the
synchronization better.

>  - Finally, it feels very hard to prove the correctness of this, and
>equally hard to test it (given how long we've been running with
>apparently racy code).  I would hope that we could abstract some of
>this into architecture generic things, that someone who eat memory
>barriers for breakfast could help us verify, but again, maybe this is
>Radim's series I'm asking for here.

What I can do here is to suggest copying the paradigms from x86, which
is quite battle tested (Windows hammers it really hard).

For QEMU I did use model checking in the past for some similarly hairy
synchronization code, but that is really just "executable documentation"
because the model is not written in C.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:58PM +0200, Andrew Jones wrote:
> Cache the MPIDR in the vcpu structure to fix potential races that
> can arise between vcpu reset and the extraction of the MPIDR from
> the sys-reg array.

I don't understand the race, sorry.

Can you be more specific in where this goes wrong and exactly what this
fixes?

Thanks,
-Christoffer

> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_emulate.h   |  2 +-
>  arch/arm/include/asm/kvm_host.h  |  3 +++
>  arch/arm/kvm/coproc.c| 20 
>  arch/arm64/include/asm/kvm_emulate.h |  2 +-
>  arch/arm64/include/asm/kvm_host.h|  3 +++
>  arch/arm64/kvm/sys_regs.c| 27 ++-
>  6 files changed, 34 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h 
> b/arch/arm/include/asm/kvm_emulate.h
> index 9a8a45aaf19a..1b922de46785 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -213,7 +213,7 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu 
> *vcpu)
>  
>  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> - return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
> + return vcpu->arch.vmpidr & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 0b8a6d6b3cb3..e0f461f0af67 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -151,6 +151,9 @@ struct kvm_vcpu_arch {
>   /* The CPU type we expose to the VM */
>   u32 midr;
>  
> + /* vcpu MPIDR */
> + u32 vmpidr;
> +
>   /* HYP trapping configuration */
>   u32 hcr;
>  
> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> index 3e5e4194ef86..c4df7c9c8ddb 100644
> --- a/arch/arm/kvm/coproc.c
> +++ b/arch/arm/kvm/coproc.c
> @@ -101,14 +101,18 @@ int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>  
>  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
>  {
> - /*
> -  * Compute guest MPIDR. We build a virtual cluster out of the
> -  * vcpu_id, but we read the 'U' bit from the underlying
> -  * hardware directly.
> -  */
> - vcpu_cp15(vcpu, c0_MPIDR) = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> -  ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) 
> |
> -  (vcpu->vcpu_id & 3));
> + if (!vcpu->arch.vmpidr) {
> + /*
> +  * Compute guest MPIDR. We build a virtual cluster out of the
> +  * vcpu_id, but we read the 'U' bit from the underlying
> +  * hardware directly.
> +  */
> + u32 mpidr = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> +  ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> +  (vcpu->vcpu_id & 3));
> + vcpu->arch.vmpidr = mpidr;
> + }
> + vcpu_cp15(vcpu, c0_MPIDR) = vcpu->arch.vmpidr;
>  }
>  
>  /* TRM entries A7:4.3.31 A15:4.3.28 - RO WI */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index f5ea0ba70f07..c138bb15b507 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -242,7 +242,7 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const 
> struct kvm_vcpu *vcpu)
>  
>  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> - return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> + return vcpu->arch.vmpidr_el2 & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 7057512b3474..268c10d95a79 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -198,6 +198,9 @@ typedef struct kvm_cpu_context kvm_cpu_context_t;
>  struct kvm_vcpu_arch {
>   struct kvm_cpu_context ctxt;
>  
> + /* vcpu MPIDR */
> + u64 vmpidr_el2;
> +
>   /* HYP configuration */
>   u64 hcr_el2;
>   u32 mdcr_el2;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0e26f8c2b56f..517aed6d8016 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -431,19 +431,20 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, 
> const struct sys_reg_desc *r)
>  
>  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  {
> - u64 mpidr;
> -
> - /*
> -  * Map the vcpu_id into the first three affinity level fields of
> -  * the MPIDR. We limit the number of VCPUs in level 0 due to a
> -  * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> -  * of the GICv3 to be able to address each CPU directly when
> -  * sending IPIs.
> - 

Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> From: Levente Kurusa 
> 
> When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> then it's possible for them to both enter the target vcpu's setup
> at the same time. This results in unexpected behaviors at best,
> and the potential for some nasty bugs at worst.
> 
> Signed-off-by: Levente Kurusa 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/kvm/psci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index f732484abc7a..0204daa899b1 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> *source_vcpu)
>*/
>   if (!vcpu)
>   return PSCI_RET_INVALID_PARAMS;
> - if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> +
> + if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
>   if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
>   return PSCI_RET_ALREADY_ON;
>   else
> @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> *source_vcpu)
>* the general puspose registers are undefined upon CPU_ON.
>*/
>   vcpu_set_reg(vcpu, 0, context_id);
> - clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>  
>   wq = kvm_arch_vcpu_wq(vcpu);
>   swake_up(wq);
> -- 
> 2.9.3
> 

Depending on what you end up doing with the requests, if you keep the
bool flag you could just use the kvm->lock mutex instead.

Have you considered if there are any potential races between
kvm_psci_system_off() being called on one VCPU while two other VPCUs are
turning on the same CPU that is being turend off as part of system-wide
power down as well?

I'm wondering if this means we should take the kvm->lock at a higher
level when handling PSCI events...

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 08:29:18PM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 07:46:12PM +0200, Christoffer Dall wrote:
> > On Fri, Mar 31, 2017 at 06:06:56PM +0200, Andrew Jones wrote:
> > > Refactor PMU overflow handling in order to remove the request-less
> > > vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
> > > there should be no chance that a kick sent at just the wrong time
> > > (between the VCPU's call to kvm_pmu_flush_hwstate() and before it
> > > enters guest mode) results in a failure for the guest to see updated
> > > GIC state until its next exit some time later for some other reason.
> > > 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  virt/kvm/arm/pmu.c | 29 +++--
> > >  1 file changed, 15 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> > > index 69ccce308458..9d725f3afb11 100644
> > > --- a/virt/kvm/arm/pmu.c
> > > +++ b/virt/kvm/arm/pmu.c
> > > @@ -203,6 +203,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu 
> > > *vcpu)
> > >   return reg;
> > >  }
> > >  
> > > +static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
> > > +{
> > > + struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > > + bool overflow;
> > > +
> > > + overflow = !!kvm_pmu_overflow_status(vcpu);
> > > + if (pmu->irq_level != overflow) {
> > > + pmu->irq_level = overflow;
> > > + kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > > + pmu->irq_num, overflow);
> > > + }
> > > +}
> > > +
> > 
> > If we are changing the way the PMU works to adjust the interrupt
> > signaling whenever the PMU changes its internal state, do we still ahv
> > to call kvm_pmu_update_state() from each flush/sync path now?
> 
> The thought crossed my mind to rework that completely, in order to remove
> that flush/sync, but then went for the smaller patch for this series.  I
> can take a look at it though.
> 

Actually, now when I actually read what this code will be doing, the
extra thing in flush/sync won't do any work because it will find that
omu->irq_level == overflow, so never mind - we can improve that later as
an optimization patch.

Let's focus on getting it right.

Thanks,
-Christoffer


> > 
> > >  /**
> > >   * kvm_pmu_overflow_set - set PMU overflow interrupt
> > >   * @vcpu: The vcpu pointer
> > > @@ -210,31 +223,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu 
> > > *vcpu)
> > >   */
> > >  void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
> > >  {
> > > - u64 reg;
> > > -
> > >   if (val == 0)
> > >   return;
> > >  
> > >   vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
> > > - reg = kvm_pmu_overflow_status(vcpu);
> > > - if (reg != 0)
> > > - kvm_vcpu_kick(vcpu);
> > > + kvm_pmu_check_overflow(vcpu);
> > >  }
> > >  
> > >  static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
> > >  {
> > > - struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > > - bool overflow;
> > > -
> > >   if (!kvm_arm_pmu_v3_ready(vcpu))
> > >   return;
> > >  
> > > - overflow = !!kvm_pmu_overflow_status(vcpu);
> > > - if (pmu->irq_level != overflow) {
> > > - pmu->irq_level = overflow;
> > > - kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > > - pmu->irq_num, overflow);
> > > - }
> > > + kvm_pmu_check_overflow(vcpu);
> > >  }
> > >  
> > >  /**
> > > -- 
> > > 2.9.3
> > > 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 07:57:18PM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 06:04:17PM +0200, Christoffer Dall wrote:
> > On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> > > This not only ensures visibility of changes to pause by using
> > > atomic ops, but also plugs a small race where a vcpu could get its
> > > pause state enabled just after its last check before entering the
> > > guest. With this patch, while the vcpu will still initially enter
> > > the guest, it will exit immediately due to the IPI sent by the vcpu
> > > kick issued after making the vcpu request.
> > > 
> > > We use bitops, rather than kvm_make/check_request(), because we
> > > don't need the barriers they provide,
> > 
> > why not?
> 
> I'll add that it's because the only state of interest is the request bit
> itself.  When the request is observable then we're good to go, no need to
> ensure that at the time the request is observable, something else is too.
> 
> > 
> > > nor do we want the side-effect
> > > of kvm_check_request() clearing the request. For pause, only the
> > > requester should do the clearing.
> > > 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  arch/arm/include/asm/kvm_host.h   |  5 +
> > >  arch/arm/kvm/arm.c| 45 
> > > +++
> > >  arch/arm64/include/asm/kvm_host.h |  5 +
> > >  3 files changed, 33 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/arch/arm/include/asm/kvm_host.h 
> > > b/arch/arm/include/asm/kvm_host.h
> > > index 31ee468ce667..52c25536d254 100644
> > > --- a/arch/arm/include/asm/kvm_host.h
> > > +++ b/arch/arm/include/asm/kvm_host.h
> > > @@ -45,7 +45,7 @@
> > >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> > >  #endif
> > >  
> > > -#define KVM_REQ_VCPU_EXIT8
> > > +#define KVM_REQ_PAUSE8
> > >  
> > >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> > >  int __attribute_const__ kvm_target_cpu(void);
> > > @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
> > >   /* vcpu power-off state */
> > >   bool power_off;
> > >  
> > > -  /* Don't run the guest (internal implementation need) */
> > > - bool pause;
> > > -
> > >   /* IO related fields */
> > >   struct kvm_decode mmio_decode;
> > >  
> > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > index 314eb6abe1ff..f3bfbb5f3d96 100644
> > > --- a/arch/arm/kvm/arm.c
> > > +++ b/arch/arm/kvm/arm.c
> > > @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
> > >  
> > >  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> > >  {
> > > + /*
> > > +  * If we return true from this function, then it means the vcpu is
> > > +  * either in guest mode, or has already indicated that it's in guest
> > > +  * mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> > > +  * and must be done before the final kvm_request_pending() read. It's
> > > +  * important that the observability of that order be enforced and that
> > > +  * the request receiving CPU can observe any new request before the
> > > +  * requester issues a kick. Thus, the general barrier below pairs with
> > > +  * the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> > > +  * write to ->mode and the final request pending read.
> > > +  */
> > 
> > I am having a hard time understanding this comment.  For example, I
> > don't understand the difference between 'is either in guest mode or has
> > already indicated it's in guest mode'.  Which case is which again, and
> > how are we checking for two cases below?
> > 
> > Also, the stuff about observability of an order is hard to follow, and
> > the comment assumes the reader is thinking about the specific race when
> > entering the guest.
> > 
> > I think we should focus on getting the documentation in place, refer to
> > the documentation from here, and be much more brief and say something
> > like:
> > 
> > /*
> >  * The memory barrier below pairs with the barrier in
> >  * kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
> >  * and reading vcpu->requests before entering the guest.
> >  *
> >  * Ensures that the VCPU thread's CPU can observe changes to
> >  * vcpu->requests written prior to calling this function before
> >  * it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
> >  * ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
> >  * only if the VCPU thread's CPU could observe writes to
> >  * vcpu->requests from this CPU.
> >  /
> > 
> > Is this correct?  I'm not really sure anymore?
> 
> It's confusing because we have cross dependencies on the negatives of
> two conditions.
> 
> Here's the cross dependencies:
> 
>   vcpu->mode = IN_GUEST_MODE;   ---   ---  kvm_make_request(REQ, vcpu);
>   smp_mb();\ / smp_mb();
> X
>/ \
>   if (kvm_request_pending(vcpu))<--   -->  if (vcpu->mode == IN_GUEST

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 20:18, Andrew Jones wrote:
>> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
>> reuse the smp_call_function_many code in kvm_make_all_cpus_request.
>> Once you add EXITING_GUEST_MODE, ARM can just add a new function
>> kvm_kick_all_cpus and use it for both pause and power_off.
>
> I was wondering about the justification of
> 'if (vcpu->mode == EXITING_GUEST_MODE)' in the x86 code, as it seemed
> redundant to me with the requests.  I'll have another think on it to see
> if request-less kicks can be satisfied in all cases by this, as long as we
> have the mode setting, barrier, mode checking order ensured in vcpu run.

Yes, this is the justification.  You should add that to
kvm_arch_vcpu_ioctl_run to close the race window (as well as the
kvm_request_pending, just for good measure).  These two are not really
optional, they are part of how kvm_vcpu_exiting_guest_mode and requests
are supposed to work.  kvm_vcpu_exiting_guest_mode is optional, but ARM
is using it and it's a pity to undo it.

Once you have done this, you can choose whether to use requests or not
for pause and poweroff, but I think it will not be necessary.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 19:42, Christoffer Dall wrote:
> 
> this is not going to be called when we don't have the vgic, which means
> that if vcpu_interrupt_line() is used as you modify it above, the
> request will never get cleared.

Heh, I'll stop pretending I can give positive reviews of ARM patches and
keep bash^Wproviding constructive criticism. :)

In fact, I forgot to add that x86 recently moved to request-less
kvm_vcpu_kick when hardware interrupt injection is available.  This was
after I noticed that requests were just used as a workaround for
incorrect ordering of local_irq_disable, vcpu->mode=IN_GUEST_MODE, and
so on.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection

2017-04-04 Thread Paolo Bonzini


On 31/03/2017 18:06, Andrew Jones wrote:
> Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> kick meant to trigger the interrupt injection could be sent while
> the VCPU is outside guest mode, which means no IPI is sent, and
> after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> the updated GIC state until its next exit some time later for some
> other reason.
> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c|  1 +
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  virt/kvm/arm/arch_timer.c |  1 +
>  virt/kvm/arm/vgic/vgic.c  | 12 ++--
>  5 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index afed5d44634d..0b8a6d6b3cb3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -47,6 +47,7 @@
>  
>  #define KVM_REQ_PAUSE8
>  #define KVM_REQ_POWER_OFF9
> +#define KVM_REQ_IRQ_PENDING  10
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7ed39060b1cf..a106feccf314 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -768,6 +768,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int 
> number, bool level)
>* trigger a world-switch round on the running physical CPU to set the
>* virtual IRQ/FIQ fields in the HCR appropriately.
>*/
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
>  
>   return 0;
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e78895f675d0..7057512b3474 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -44,6 +44,7 @@
>  
>  #define KVM_REQ_PAUSE8
>  #define KVM_REQ_POWER_OFF9
> +#define KVM_REQ_IRQ_PENDING  10
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 35d7100e0815..3c48abbf951b 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct 
> *work)
>* If the vcpu is blocked we want to wake it up so that it will see
>* the timer has expired when entering the guest.
>*/
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 654dfd40e449..31fb89057f0c 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct 
> vgic_irq *irq)
>* won't see this one until it exits for some other
>* reason.
>*/
> - if (vcpu)
> + if (vcpu) {
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
> + }
>   return false;
>   }
>  
> @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct 
> vgic_irq *irq)
>   spin_unlock(&irq->irq_lock);
>   spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
>  
>   return true;
> @@ -654,6 +657,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>   spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
>   vgic_flush_lr_state(vcpu);
>   spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +
> + /* The GIC is now ready to deliver the IRQ. */
> + clear_bit(KVM_REQ_IRQ_PENDING, &vcpu->requests);
>  }
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> @@ -691,8 +697,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
>* a good kick...
>*/
>   kvm_for_each_vcpu(c, vcpu, kvm) {
> - if (kvm_vgic_vcpu_pending_irq(vcpu))
> + if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
> + }
>   }
>  }
>  
> 

Reviewed-by: Paolo Bonzini 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 08:15:09PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:57, Christoffer Dall wrote:
> >> Right.  That code does
> >>
> >> tmp->arch.power_off = true;
> >> kvm_vcpu_kick(tmp);
> >>
> >> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> >> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> >> also simply use kvm_vcpu_kick.
> > I see, that's why the cmpxchg() works the way it does.  We just still
> > need to move the vcpu->mode = IN_GUEST_MODE before our
> > with-interrupts-disabled check.
> > 
> > What I'm not sure is why you can get away without using a memory barrier
> > or WRITE_ONCE on x86, but is this simply because x86 is a strongly
> > ordered architecture?
> 
> x86 does have a memory barrier:
> 
> vcpu->mode = IN_GUEST_MODE;
> 
> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> smp_mb__after_srcu_read_unlock();

duh, the long complicated barrier version made me totally miss it.
Sorry.

> 
> /*
>  * This handles the case where a posted interrupt was
>  * notified with kvm_vcpu_kick.
>  */
> if (kvm_lapic_enabled(vcpu)) {
> if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
> kvm_x86_ops->sync_pir_to_irr(vcpu);
> }
> 
> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> 
> and WRITE_ONCE is not needed if you have a memory barrier (though I find it
> more self-documenting to use it anyway).
> 

ok, thanks.

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 07:46:12PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:56PM +0200, Andrew Jones wrote:
> > Refactor PMU overflow handling in order to remove the request-less
> > vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
> > there should be no chance that a kick sent at just the wrong time
> > (between the VCPU's call to kvm_pmu_flush_hwstate() and before it
> > enters guest mode) results in a failure for the guest to see updated
> > GIC state until its next exit some time later for some other reason.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  virt/kvm/arm/pmu.c | 29 +++--
> >  1 file changed, 15 insertions(+), 14 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> > index 69ccce308458..9d725f3afb11 100644
> > --- a/virt/kvm/arm/pmu.c
> > +++ b/virt/kvm/arm/pmu.c
> > @@ -203,6 +203,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu 
> > *vcpu)
> > return reg;
> >  }
> >  
> > +static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
> > +{
> > +   struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > +   bool overflow;
> > +
> > +   overflow = !!kvm_pmu_overflow_status(vcpu);
> > +   if (pmu->irq_level != overflow) {
> > +   pmu->irq_level = overflow;
> > +   kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > +   pmu->irq_num, overflow);
> > +   }
> > +}
> > +
> 
> If we are changing the way the PMU works to adjust the interrupt
> signaling whenever the PMU changes its internal state, do we still ahv
> to call kvm_pmu_update_state() from each flush/sync path now?

The thought crossed my mind to rework that completely, in order to remove
that flush/sync, but then went for the smaller patch for this series.  I
can take a look at it though.

Thanks,
drew

> 
> >  /**
> >   * kvm_pmu_overflow_set - set PMU overflow interrupt
> >   * @vcpu: The vcpu pointer
> > @@ -210,31 +223,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu 
> > *vcpu)
> >   */
> >  void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
> >  {
> > -   u64 reg;
> > -
> > if (val == 0)
> > return;
> >  
> > vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
> > -   reg = kvm_pmu_overflow_status(vcpu);
> > -   if (reg != 0)
> > -   kvm_vcpu_kick(vcpu);
> > +   kvm_pmu_check_overflow(vcpu);
> >  }
> >  
> >  static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
> >  {
> > -   struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > -   bool overflow;
> > -
> > if (!kvm_arm_pmu_v3_ready(vcpu))
> > return;
> >  
> > -   overflow = !!kvm_pmu_overflow_status(vcpu);
> > -   if (pmu->irq_level != overflow) {
> > -   pmu->irq_level = overflow;
> > -   kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > -   pmu->irq_num, overflow);
> > -   }
> > +   kvm_pmu_check_overflow(vcpu);
> >  }
> >  
> >  /**
> > -- 
> > 2.9.3
> > 
> 
> Thanks,
> -Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 07:42:08PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:55PM +0200, Andrew Jones wrote:
> > Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> > kick meant to trigger the interrupt injection could be sent while
> > the VCPU is outside guest mode, which means no IPI is sent, and
> > after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> > the updated GIC state until its next exit some time later for some
> > other reason.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  1 +
> >  arch/arm/kvm/arm.c|  1 +
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  virt/kvm/arm/arch_timer.c |  1 +
> >  virt/kvm/arm/vgic/vgic.c  | 12 ++--
> >  5 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index afed5d44634d..0b8a6d6b3cb3 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -47,6 +47,7 @@
> >  
> >  #define KVM_REQ_PAUSE  8
> >  #define KVM_REQ_POWER_OFF  9
> > +#define KVM_REQ_IRQ_PENDING10
> >  
> >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> >  int __attribute_const__ kvm_target_cpu(void);
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 7ed39060b1cf..a106feccf314 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -768,6 +768,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, 
> > int number, bool level)
> >  * trigger a world-switch round on the running physical CPU to set the
> >  * virtual IRQ/FIQ fields in the HCR appropriately.
> >  */
> > +   kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > kvm_vcpu_kick(vcpu);
> >  
> > return 0;
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index e78895f675d0..7057512b3474 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -44,6 +44,7 @@
> >  
> >  #define KVM_REQ_PAUSE  8
> >  #define KVM_REQ_POWER_OFF  9
> > +#define KVM_REQ_IRQ_PENDING10
> >  
> >  int __attribute_const__ kvm_target_cpu(void);
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 35d7100e0815..3c48abbf951b 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct 
> > *work)
> >  * If the vcpu is blocked we want to wake it up so that it will see
> >  * the timer has expired when entering the guest.
> >  */
> > +   kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > kvm_vcpu_kick(vcpu);
> >  }
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 654dfd40e449..31fb89057f0c 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct 
> > vgic_irq *irq)
> >  * won't see this one until it exits for some other
> >  * reason.
> >  */
> > -   if (vcpu)
> > +   if (vcpu) {
> > +   kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > kvm_vcpu_kick(vcpu);
> > +   }
> > return false;
> > }
> >  
> > @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct 
> > vgic_irq *irq)
> > spin_unlock(&irq->irq_lock);
> > spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >  
> > +   kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > kvm_vcpu_kick(vcpu);
> >  
> > return true;
> > @@ -654,6 +657,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
> > spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > vgic_flush_lr_state(vcpu);
> > spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > +
> > +   /* The GIC is now ready to deliver the IRQ. */
> > +   clear_bit(KVM_REQ_IRQ_PENDING, &vcpu->requests);
> 
> this is not going to be called when we don't have the vgic, which means
> that if vcpu_interrupt_line() is used as you modify it above, the
> request will never get cleared.

Ah, thanks. I'll try to sort that out a better way.  Of course with
Paolo's comments about request-less kicks working due to the mode
change in kvm_vcpu_kick() (when an additional condition is added
before the vcpu enters guest mode), then maybe we don't need this, or
the next patch, at all.

Thanks,
drew

> 
> >  }
> >  
> >  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> > @@ -691,8 +697,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
> >  * a good kick...
> >  */
> > kvm_for_each_vcpu(c, vcpu, kvm) {
> > -   if (kvm_vgic_vcpu_pending_irq(vcpu))
> > +   if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> > +   kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 07:35:11PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:19, Christoffer Dall wrote:
> > On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 04/04/2017 18:04, Christoffer Dall wrote:
>  For pause, only the requester should do the clearing.
> >>
> >> This suggests that maybe this should not be a request.  The request
> >> would be just the need to act on a GIC command, exactly as before this 
> >> patch.
> > 
> > Maybe the semantics should be:
> > 
> > requester:vcpu:
> > ---
> > make_requet(vcpu, KVM_REQ_PAUSE);
> >   handles the request by
> >   clearing it and setting
> >   vcpu->pause = true;
> > wait until vcpu->pause == true
> > make_request(vcpu, KVM_REQ_UNPAUSE);
> >   vcpus 'wake up' clear the
> >   UNPAUSE request and set
> >   vcpu->pause = false;

I thought of this originally, but then decided to [ab]use the concept
of pause being a boolean and requests being bits in a bitmap.  Simpler,
but arguably not as clean.

> > 
> > The benefit would be that we get to re-use the complicated "figure out
> > the VCPU mode and whether or not we should send an IPI and get the
> > barriers right" stuff.
> 
> I don't think that's necessary.  As long as the complicated stuff avoids
> that you enter the VCPU, the next run through the loop will
> find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
> go to sleep.
> 
> >> What I don't understand is:
> >>
>  With this patch, while the vcpu will still initially enter
>  the guest, it will exit immediately due to the IPI sent by the vcpu
>  kick issued after making the vcpu request.
> >>
> >> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?

As you state below, KVM_REQ_VCPU_EXIT was getting used as a
kick-all-vcpus, but without the request/mode stuff it wasn't
sufficient for the small race window.

> >>
> >> So this:
> >>
> >> +  vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> >> +  WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> >>
> >> is the crux of the fix, you can keep using vcpu->arch.pause.
> > 
> > Probably; I feel like there's a fix here which should be a separate
> > patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> > the pause flag.
> 
> Yeah, and then the pause flag can stay.
> 
> >> By the way, vcpu->arch.power_off can go away from this "if" too because
> >> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> > 
> > But we also allow setting the power_off flag from the in-kernel PSCI
> > emulation in the context of another VCPU thread.
> 
> Right.  That code does
> 
> tmp->arch.power_off = true;
> kvm_vcpu_kick(tmp);
> 
> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> also simply use kvm_vcpu_kick.
> 
> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
> reuse the smp_call_function_many code in kvm_make_all_cpus_request.
> Once you add EXITING_GUEST_MODE, ARM can just add a new function
> kvm_kick_all_cpus and use it for both pause and power_off.
> 

I was wondering about the justification of
'if (vcpu->mode == EXITING_GUEST_MODE)' in the x86 code, as it seemed
redundant to me with the requests.  I'll have another think on it to see
if request-less kicks can be satisfied in all cases by this, as long as we
have the mode setting, barrier, mode checking order ensured in vcpu run.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 19:57, Christoffer Dall wrote:
>> Right.  That code does
>>
>> tmp->arch.power_off = true;
>> kvm_vcpu_kick(tmp);
>>
>> and I think what's really missing in arm.c is the "if (vcpu->mode ==
>> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
>> also simply use kvm_vcpu_kick.
> I see, that's why the cmpxchg() works the way it does.  We just still
> need to move the vcpu->mode = IN_GUEST_MODE before our
> with-interrupts-disabled check.
> 
> What I'm not sure is why you can get away without using a memory barrier
> or WRITE_ONCE on x86, but is this simply because x86 is a strongly
> ordered architecture?

x86 does have a memory barrier:

vcpu->mode = IN_GUEST_MODE;

srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
smp_mb__after_srcu_read_unlock();

/*
 * This handles the case where a posted interrupt was
 * notified with kvm_vcpu_kick.
 */
if (kvm_lapic_enabled(vcpu)) {
if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
kvm_x86_ops->sync_pir_to_irr(vcpu);
}

if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests

and WRITE_ONCE is not needed if you have a memory barrier (though I find it
more self-documenting to use it anyway).

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 06:04:17PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> > This not only ensures visibility of changes to pause by using
> > atomic ops, but also plugs a small race where a vcpu could get its
> > pause state enabled just after its last check before entering the
> > guest. With this patch, while the vcpu will still initially enter
> > the guest, it will exit immediately due to the IPI sent by the vcpu
> > kick issued after making the vcpu request.
> > 
> > We use bitops, rather than kvm_make/check_request(), because we
> > don't need the barriers they provide,
> 
> why not?

I'll add that it's because the only state of interest is the request bit
itself.  When the request is observable then we're good to go, no need to
ensure that at the time the request is observable, something else is too.

> 
> > nor do we want the side-effect
> > of kvm_check_request() clearing the request. For pause, only the
> > requester should do the clearing.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  5 +
> >  arch/arm/kvm/arm.c| 45 
> > +++
> >  arch/arm64/include/asm/kvm_host.h |  5 +
> >  3 files changed, 33 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index 31ee468ce667..52c25536d254 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -45,7 +45,7 @@
> >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> >  #endif
> >  
> > -#define KVM_REQ_VCPU_EXIT  8
> > +#define KVM_REQ_PAUSE  8
> >  
> >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> >  int __attribute_const__ kvm_target_cpu(void);
> > @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
> > /* vcpu power-off state */
> > bool power_off;
> >  
> > -/* Don't run the guest (internal implementation need) */
> > -   bool pause;
> > -
> > /* IO related fields */
> > struct kvm_decode mmio_decode;
> >  
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 314eb6abe1ff..f3bfbb5f3d96 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
> >  
> >  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> >  {
> > +   /*
> > +* If we return true from this function, then it means the vcpu is
> > +* either in guest mode, or has already indicated that it's in guest
> > +* mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> > +* and must be done before the final kvm_request_pending() read. It's
> > +* important that the observability of that order be enforced and that
> > +* the request receiving CPU can observe any new request before the
> > +* requester issues a kick. Thus, the general barrier below pairs with
> > +* the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> > +* write to ->mode and the final request pending read.
> > +*/
> 
> I am having a hard time understanding this comment.  For example, I
> don't understand the difference between 'is either in guest mode or has
> already indicated it's in guest mode'.  Which case is which again, and
> how are we checking for two cases below?
> 
> Also, the stuff about observability of an order is hard to follow, and
> the comment assumes the reader is thinking about the specific race when
> entering the guest.
> 
> I think we should focus on getting the documentation in place, refer to
> the documentation from here, and be much more brief and say something
> like:
> 
>   /*
>* The memory barrier below pairs with the barrier in
>* kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
>* and reading vcpu->requests before entering the guest.
>*
>* Ensures that the VCPU thread's CPU can observe changes to
>* vcpu->requests written prior to calling this function before
>* it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
>* ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
>* only if the VCPU thread's CPU could observe writes to
>* vcpu->requests from this CPU.
>/
> 
> Is this correct?  I'm not really sure anymore?

It's confusing because we have cross dependencies on the negatives of
two conditions.

Here's the cross dependencies:

  vcpu->mode = IN_GUEST_MODE;   ---   ---  kvm_make_request(REQ, vcpu);
  smp_mb();\ / smp_mb();
X
   / \
  if (kvm_request_pending(vcpu))<--   -->  if (vcpu->mode == IN_GUEST_MODE)

On each side the smp_mb() ensures no reordering of the pair of operations
that each side has.  I.e. on the LHS the requests LOAD cannot be ordered
before the mode STORE and on the RHS side the mode LOAD cannot be ordered
befor

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 07:35:11PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:19, Christoffer Dall wrote:
> > On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 04/04/2017 18:04, Christoffer Dall wrote:
>  For pause, only the requester should do the clearing.
> >>
> >> This suggests that maybe this should not be a request.  The request
> >> would be just the need to act on a GIC command, exactly as before this 
> >> patch.
> > 
> > Maybe the semantics should be:
> > 
> > requester:vcpu:
> > ---
> > make_requet(vcpu, KVM_REQ_PAUSE);
> >   handles the request by
> >   clearing it and setting
> >   vcpu->pause = true;
> > wait until vcpu->pause == true
> > make_request(vcpu, KVM_REQ_UNPAUSE);
> >   vcpus 'wake up' clear the
> >   UNPAUSE request and set
> >   vcpu->pause = false;
> > 
> > The benefit would be that we get to re-use the complicated "figure out
> > the VCPU mode and whether or not we should send an IPI and get the
> > barriers right" stuff.
> 
> I don't think that's necessary.  As long as the complicated stuff avoids
> that you enter the VCPU, the next run through the loop will
> find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
> go to sleep.
> 
> >> What I don't understand is:
> >>
>  With this patch, while the vcpu will still initially enter
>  the guest, it will exit immediately due to the IPI sent by the vcpu
>  kick issued after making the vcpu request.
> >>
> >> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
> >>
> >> So this:
> >>
> >> +  vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> >> +  WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> >>
> >> is the crux of the fix, you can keep using vcpu->arch.pause.
> > 
> > Probably; I feel like there's a fix here which should be a separate
> > patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> > the pause flag.
> 
> Yeah, and then the pause flag can stay.
> 
> >> By the way, vcpu->arch.power_off can go away from this "if" too because
> >> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> > 
> > But we also allow setting the power_off flag from the in-kernel PSCI
> > emulation in the context of another VCPU thread.
> 
> Right.  That code does
> 
> tmp->arch.power_off = true;
> kvm_vcpu_kick(tmp);
> 
> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> also simply use kvm_vcpu_kick.

I see, that's why the cmpxchg() works the way it does.  We just still
need to move the vcpu->mode = IN_GUEST_MODE before our
with-interrupts-disabled check.

What I'm not sure is why you can get away without using a memory barrier
or WRITE_ONCE on x86, but is this simply because x86 is a strongly
ordered architecture?

> 
> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
> reuse the smp_call_function_many code in kvm_make_all_cpus_request.

Your understanding is correct.

> Once you add EXITING_GUEST_MODE, ARM can just add a new function
> kvm_kick_all_cpus and use it for both pause and power_off.
> 

Yes, that should work.

I think Drew's approach should also work, but at this point, I'm not
really sure which approach is better than the other.


Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:56PM +0200, Andrew Jones wrote:
> Refactor PMU overflow handling in order to remove the request-less
> vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
> there should be no chance that a kick sent at just the wrong time
> (between the VCPU's call to kvm_pmu_flush_hwstate() and before it
> enters guest mode) results in a failure for the guest to see updated
> GIC state until its next exit some time later for some other reason.
> 
> Signed-off-by: Andrew Jones 
> ---
>  virt/kvm/arm/pmu.c | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 69ccce308458..9d725f3afb11 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -203,6 +203,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
>   return reg;
>  }
>  
> +static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_pmu *pmu = &vcpu->arch.pmu;
> + bool overflow;
> +
> + overflow = !!kvm_pmu_overflow_status(vcpu);
> + if (pmu->irq_level != overflow) {
> + pmu->irq_level = overflow;
> + kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> + pmu->irq_num, overflow);
> + }
> +}
> +

If we are changing the way the PMU works to adjust the interrupt
signaling whenever the PMU changes its internal state, do we still ahv
to call kvm_pmu_update_state() from each flush/sync path now?

>  /**
>   * kvm_pmu_overflow_set - set PMU overflow interrupt
>   * @vcpu: The vcpu pointer
> @@ -210,31 +223,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu 
> *vcpu)
>   */
>  void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
>  {
> - u64 reg;
> -
>   if (val == 0)
>   return;
>  
>   vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
> - reg = kvm_pmu_overflow_status(vcpu);
> - if (reg != 0)
> - kvm_vcpu_kick(vcpu);
> + kvm_pmu_check_overflow(vcpu);
>  }
>  
>  static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
>  {
> - struct kvm_pmu *pmu = &vcpu->arch.pmu;
> - bool overflow;
> -
>   if (!kvm_arm_pmu_v3_ready(vcpu))
>   return;
>  
> - overflow = !!kvm_pmu_overflow_status(vcpu);
> - if (pmu->irq_level != overflow) {
> - pmu->irq_level = overflow;
> - kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> - pmu->irq_num, overflow);
> - }
> + kvm_pmu_check_overflow(vcpu);
>  }
>  
>  /**
> -- 
> 2.9.3
> 

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:55PM +0200, Andrew Jones wrote:
> Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> kick meant to trigger the interrupt injection could be sent while
> the VCPU is outside guest mode, which means no IPI is sent, and
> after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> the updated GIC state until its next exit some time later for some
> other reason.
> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c|  1 +
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  virt/kvm/arm/arch_timer.c |  1 +
>  virt/kvm/arm/vgic/vgic.c  | 12 ++--
>  5 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index afed5d44634d..0b8a6d6b3cb3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -47,6 +47,7 @@
>  
>  #define KVM_REQ_PAUSE8
>  #define KVM_REQ_POWER_OFF9
> +#define KVM_REQ_IRQ_PENDING  10
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7ed39060b1cf..a106feccf314 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -768,6 +768,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int 
> number, bool level)
>* trigger a world-switch round on the running physical CPU to set the
>* virtual IRQ/FIQ fields in the HCR appropriately.
>*/
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
>  
>   return 0;
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e78895f675d0..7057512b3474 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -44,6 +44,7 @@
>  
>  #define KVM_REQ_PAUSE8
>  #define KVM_REQ_POWER_OFF9
> +#define KVM_REQ_IRQ_PENDING  10
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 35d7100e0815..3c48abbf951b 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct 
> *work)
>* If the vcpu is blocked we want to wake it up so that it will see
>* the timer has expired when entering the guest.
>*/
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 654dfd40e449..31fb89057f0c 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct 
> vgic_irq *irq)
>* won't see this one until it exits for some other
>* reason.
>*/
> - if (vcpu)
> + if (vcpu) {
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
> + }
>   return false;
>   }
>  
> @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct 
> vgic_irq *irq)
>   spin_unlock(&irq->irq_lock);
>   spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
>  
>   return true;
> @@ -654,6 +657,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>   spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
>   vgic_flush_lr_state(vcpu);
>   spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +
> + /* The GIC is now ready to deliver the IRQ. */
> + clear_bit(KVM_REQ_IRQ_PENDING, &vcpu->requests);

this is not going to be called when we don't have the vgic, which means
that if vcpu_interrupt_line() is used as you modify it above, the
request will never get cleared.

>  }
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> @@ -691,8 +697,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
>* a good kick...
>*/
>   kvm_for_each_vcpu(c, vcpu, kvm) {
> - if (kvm_vgic_vcpu_pending_irq(vcpu))
> + if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
> + }
>   }
>  }
>  
> -- 
> 2.9.3
> 

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 5/9] KVM: arm/arm64: replace vcpu->arch.power_off with a vcpu request

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:54PM +0200, Andrew Jones wrote:
> Like pause, replacing power_off with a vcpu request ensures
> visibility of changes and avoids the final race before entering
> the guest.

I think it's worth explaining the race in the commit message first, just
briefly.

> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_host.h   |  4 +---
>  arch/arm/kvm/arm.c| 32 ++--
>  arch/arm/kvm/psci.c   | 17 +
>  arch/arm64/include/asm/kvm_host.h |  4 +---
>  4 files changed, 25 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 52c25536d254..afed5d44634d 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -46,6 +46,7 @@
>  #endif
>  
>  #define KVM_REQ_PAUSE8
> +#define KVM_REQ_POWER_OFF9
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> @@ -170,9 +171,6 @@ struct kvm_vcpu_arch {
>* here.
>*/
>  
> - /* vcpu power-off state */
> - bool power_off;
> -
>   /* IO related fields */
>   struct kvm_decode mmio_decode;
>  
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index f3bfbb5f3d96..7ed39060b1cf 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -381,7 +381,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>   struct kvm_mp_state *mp_state)
>  {
> - if (vcpu->arch.power_off)
> + if (test_bit(KVM_REQ_POWER_OFF, &vcpu->requests))
>   mp_state->mp_state = KVM_MP_STATE_STOPPED;
>   else
>   mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
> @@ -394,10 +394,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu 
> *vcpu,
>  {
>   switch (mp_state->mp_state) {
>   case KVM_MP_STATE_RUNNABLE:
> - vcpu->arch.power_off = false;
> + clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>   break;
>   case KVM_MP_STATE_STOPPED:
> - vcpu->arch.power_off = true;
> + set_bit(KVM_REQ_POWER_OFF, &vcpu->requests);

this looks a bit dodgy; I am getting an even stronger feeling that we
should keep power_off = true, and here we can safely set it directly
because we have mutual exclusion from KVM_RUN, and that leaves us using
requests only to "ask the VCPU to do something for us, like setting its
power_off state", except...

>   break;
>   default:
>   return -EINVAL;
> @@ -415,9 +415,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>   */
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
> - return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> - && !v->arch.power_off
> - && !test_bit(KVM_REQ_PAUSE, &v->requests));
> + return (!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v)) &&
> + !test_bit(KVM_REQ_POWER_OFF, &v->requests) &&
> + !test_bit(KVM_REQ_PAUSE, &v->requests);
>  }
>  
>  /* Just ensure a guest exit from a particular CPU */
> @@ -578,8 +578,9 @@ static void vcpu_sleep(struct kvm_vcpu *vcpu)
>  {
>   struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
>  
> - swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
> - (!test_bit(KVM_REQ_PAUSE, &vcpu->requests;
> + swait_event_interruptible(*wq,
> + !test_bit(KVM_REQ_POWER_OFF, &vcpu->requests) &&
> + !test_bit(KVM_REQ_PAUSE, &vcpu->requests));
>  }
>  
>  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> @@ -632,8 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>  
>   update_vttbr(vcpu->kvm);
>  
> - if (vcpu->arch.power_off || test_bit(KVM_REQ_PAUSE, 
> &vcpu->requests))
> - vcpu_sleep(vcpu);
> + if (kvm_request_pending(vcpu)) {
> + if (test_bit(KVM_REQ_POWER_OFF, &vcpu->requests) ||
> + test_bit(KVM_REQ_PAUSE, &vcpu->requests))
> + vcpu_sleep(vcpu);
> + }

...hmm, I do like that we only need to check the requests variable once,
and not check multiple flags, but at least we'd only have to do it once
(not after disabling interrupts again).

>  
>   /*
>* Preparing the interrupts to be injected also
> @@ -664,8 +668,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
>   smp_mb();
>  
> - if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> - vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> + if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)
> + || kvm_request_pending(vcpu)) {
>   

Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 19:23, Christoffer Dall wrote:
>> I think all !IN_GUEST_MODE should behave the same, so I was avoiding
>> the use of EXITING_GUEST_MODE and OUTSIDE_GUEST_MODE, which wouldn't be
>> hard to address, but then I'd also have to address
>> READING_SHADOW_PAGE_TABLES, which may complicate the document more than
>> necessary.  I'm not sure we need to address a VCPU exiting guest mode,
>> other than making sure it's clear that a VCPU that exits must check
>> requests before it enters again.
> 
> But the problem is that kvm_make_all_cpus_request() only sends IPIs to
> CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
> about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
> subtlety here which I feel like it's dangerous to paper over.

Don't bother documenting READING_SHADOW_PAGE_TABLES---but
EXITING_GUEST_MODE should be used in ARM and documented, because it's
the key in making kvm_vcpu_kick not racy.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 19:19, Christoffer Dall wrote:
> On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 04/04/2017 18:04, Christoffer Dall wrote:
 For pause, only the requester should do the clearing.
>>
>> This suggests that maybe this should not be a request.  The request
>> would be just the need to act on a GIC command, exactly as before this patch.
> 
> Maybe the semantics should be:
> 
> requester:vcpu:
> ---
> make_requet(vcpu, KVM_REQ_PAUSE);
>   handles the request by
> clearing it and setting
> vcpu->pause = true;
> wait until vcpu->pause == true
> make_request(vcpu, KVM_REQ_UNPAUSE);
>   vcpus 'wake up' clear the
> UNPAUSE request and set
> vcpu->pause = false;
> 
> The benefit would be that we get to re-use the complicated "figure out
> the VCPU mode and whether or not we should send an IPI and get the
> barriers right" stuff.

I don't think that's necessary.  As long as the complicated stuff avoids
that you enter the VCPU, the next run through the loop will
find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
go to sleep.

>> What I don't understand is:
>>
 With this patch, while the vcpu will still initially enter
 the guest, it will exit immediately due to the IPI sent by the vcpu
 kick issued after making the vcpu request.
>>
>> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
>>
>> So this:
>>
>> +vcpu->arch.power_off || kvm_request_pending(vcpu)) {
>> +WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
>>
>> is the crux of the fix, you can keep using vcpu->arch.pause.
> 
> Probably; I feel like there's a fix here which should be a separate
> patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> the pause flag.

Yeah, and then the pause flag can stay.

>> By the way, vcpu->arch.power_off can go away from this "if" too because
>> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> 
> But we also allow setting the power_off flag from the in-kernel PSCI
> emulation in the context of another VCPU thread.

Right.  That code does

tmp->arch.power_off = true;
kvm_vcpu_kick(tmp);

and I think what's really missing in arm.c is the "if (vcpu->mode ==
EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
also simply use kvm_vcpu_kick.

My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
reuse the smp_call_function_many code in kvm_make_all_cpus_request.
Once you add EXITING_GUEST_MODE, ARM can just add a new function
kvm_kick_all_cpus and use it for both pause and power_off.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 07:06:00PM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 05:24:03PM +0200, Christoffer Dall wrote:
> > Hi Drew,
> > 
> > On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  Documentation/virtual/kvm/vcpu-requests.rst | 114 
> > > 
> > >  1 file changed, 114 insertions(+)
> > >  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> > > 
> > > diff --git a/Documentation/virtual/kvm/vcpu-requests.rst 
> > > b/Documentation/virtual/kvm/vcpu-requests.rst
> > > new file mode 100644
> > > index ..ea4a966d5c8a
> > > --- /dev/null
> > > +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> > > @@ -0,0 +1,114 @@
> > > +=
> > > +KVM VCPU Requests
> > > +=
> > > +
> > > +Overview
> > > +
> > > +
> > > +KVM supports an internal API enabling threads to request a VCPU thread to
> > > +perform some activity.  For example, a thread may request a VCPU to flush
> > > +its TLB with a VCPU request.  The API consists of only four calls::
> > > +
> > > +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> > > +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Check if any requests are pending for VCPU @vcpu. */
> > > +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Make request @req of VCPU @vcpu. */
> > > +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> > > +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> > > +
> > > +Typically a requester wants the VCPU to perform the activity as soon
> > > +as possible after making the request.  This means most requests,
> > > +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> > > +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> > > +into it.
> > > +
> > > +VCPU Kicks
> > > +--
> > > +
> > > +A VCPU kick does one of three things:
> > > +
> > > + 1) wakes a sleeping VCPU (which sleeps outside guest mode).
> > 
> > You could clarify this to say that a sleeping VCPU is a VCPU thread
> > which is not runnable and placed on waitqueue, and waking it makes
> > the thread runnable again.
> > 
> > > + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> > > +out.
> > > + 3) nothing, when the VCPU is already outside guest mode and not 
> > > sleeping.
> > > +
> > > +VCPU Request Internals
> > > +==
> > > +
> > > +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> > > +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> > > +may also be used.  The first 8 bits are reserved for architecture
> > > +independent requests, all additional bits are available for architecture
> > > +dependent requests.
> > 
> > Should we explain the ones that are generically defined and how they're
> > supposed to be used?  For example, we don't use them on ARM, and I don't
> > think I understand why another thread would ever make a PENDING_TIMER
> > request on a vcpu?
> 
> Yes, I agree the general requests should be described.  I'll have to
> figure out how :-)  Describing KVM_REQ_UNHALT will likely lead to a
> subsection on kvm_vcpu_block(), as you bring up below.
> 
> > 
> > > +
> > > +VCPU Requests with Associated State
> > > +===
> > > +
> > > +Requesters that want the requested VCPU to handle new state need to 
> > > ensure
> > > +the state is observable to the requested VCPU thread's CPU at the time 
> > > the
> > 
> > nit: need to ensure that the newly written state is observable ... by
> > the time it observed the request.
> > 
> > > +CPU observes the request.  This means a write memory barrier should be
> >  ^^^
> >  must
> > 
> > > +insert between the preparation of the state and the write of the VCPU
> > ^^^
> >inserted
> > 
> > I would rephrase this as: '... after writing the new state to memory and
> > before setting the VCPU request bit.'
> > 
> > 
> > > +request bitmap.  Additionally, on the requested VCPU thread's side, a
> > > +corresponding read barrier should be issued after reading the request bit
> > ^^^   ^^^
> >must  inserted (for consistency)
> > 
> > 
> > 
> > > +and before proceeding to use the state associated with it.  See the 
> > > kernel
> > ^^^^
> >readnew
> > 
> > 
> > > +memory barrier documentation [2].
> > 
> > I think it would be great if this document explains if this is currently
> > taken care of by the API you explain above or if there are cases where
> > people have to explicitly insert the

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 18:04, Christoffer Dall wrote:
> >> For pause, only the requester should do the clearing.
> 
> This suggests that maybe this should not be a request.  The request
> would be just the need to act on a GIC command, exactly as before this patch.

Maybe the semantics should be:

requester:vcpu:
---
make_requet(vcpu, KVM_REQ_PAUSE);
  handles the request by
  clearing it and setting
  vcpu->pause = true;
wait until vcpu->pause == true
make_request(vcpu, KVM_REQ_UNPAUSE);
  vcpus 'wake up' clear the
  UNPAUSE request and set
  vcpu->pause = false;

The benefit would be that we get to re-use the complicated "figure out
the VCPU mode and whether or not we should send an IPI and get the
barriers right" stuff.

> 
> What I don't understand is:
> 
> >> With this patch, while the vcpu will still initially enter
> >> the guest, it will exit immediately due to the IPI sent by the vcpu
> >> kick issued after making the vcpu request.
> 
> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
> 
> So this:
> 
> + vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> + WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> 
> is the crux of the fix, you can keep using vcpu->arch.pause.

Probably; I feel like there's a fix here which should be a separate
patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
the pause flag.

> 
> By the way, vcpu->arch.power_off can go away from this "if" too because
> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.

But we also allow setting the power_off flag from the in-kernel PSCI
emulation in the context of another VCPU thread.

> The earlier check is enough:
> 
>  if (vcpu->arch.power_off || vcpu->arch.pause)
>  vcpu_sleep(vcpu);
> 
> 
> >> +  /*
> >> +   * Indicate we're in guest mode now, before doing a final
> >> +   * check for pending vcpu requests. The general barrier
> >> +   * pairs with the one in kvm_arch_vcpu_should_kick().
> >> +   * Please see the comment there for more details.
> >> +   */
> >> +  WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
> >> +  smp_mb();
> > 
> > There are two changes here:
> > 
> > there's a change from a normal write to a WRITE_ONCE and there's also a
> > change to that adds a memory barrier.  I feel like I'd like to know if
> > these are tied together or two separate cleanups.  I also wonder if we
> > could split out more general changes from the pause thing to have a
> > better log of why we changed the run loop?
> 
> You probably should just use smp_store_mb here.
> 

That looks cleaner at least.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 00/15] arm64/kvm: use common sysreg definitions

2017-04-04 Thread Catalin Marinas
On Wed, Mar 29, 2017 at 09:41:47AM +0100, Will Deacon wrote:
> On Tue, Mar 28, 2017 at 10:29:31PM +0200, Christoffer Dall wrote:
> > On Tue, Mar 28, 2017 at 07:48:28PM +0100, Mark Rutland wrote:
> > > On Wed, Mar 22, 2017 at 06:35:13PM +, Mark Rutland wrote:
> > > > On Fri, Mar 10, 2017 at 06:35:55PM +, Will Deacon wrote:
> > > > > On Fri, Mar 10, 2017 at 08:17:22AM +, Marc Zyngier wrote:
> > > > 
> > > > > > The next question is how do we merge this. Obviously, we can't 
> > > > > > split it
> > > > > > between trees, and this is very likely to clash with anything that 
> > > > > > we
> > > > > > will merge on the KVM side (the sysreg table is a popular place).
> > > > > > 
> > > > > > Will, Catalin: Would it make sense to create a stable branch with 
> > > > > > these
> > > > > > patches, and merge it into both the arm64 and KVM trees? That'd make
> > > > > > things easier...
> > > > > 
> > > > > I think the scope for conflict on our side is pretty high too, so a 
> > > > > shared
> > > > > branch might be the best way to go. I don't want to branch just yet 
> > > > > though,
> > > > > so I'll probably wait a week or so before setting something in stone.
> > > > 
> > > > Any further thoughts on this?
> > > > 
> > > > Christoffer has Acked the KVM bits, so if you're happy to do so for the
> > > > arm64 bits I can make a stable branch.
> > > 
> > > Looking around, it doesn't look like there's anything outside of arm64
> > > that'll conflict on the  changes, and git's happy to merge
> > > my changes with Suzuki's changes currently queued in arm64's
> > > for-next/core branch.
> > > 
> > > I think it would make sense for those to be in a common branch taken by
> > > both the arm64 and KVM trees, with the KVM-specific parts being taken by
> > > KVM alone atop of that.
> > > 
> > > Would everyone be happy with that?
> > 
> > I'm happy with that.
> > 
> > > 
> > > For reference, I've updated my branches so that arm64/common-sysreg only
> > > contains the common parts, with the KVM parts atop of that in
> > > kvm/common-sysreg.
> > > 
> > 
> > Will, Catalin:  Let me know if you're going to pull from common-sysreg
> > and I'll do the same and add the kvm patches above.
> 
> I think that's what we'll do, but Catalin's out this week (we're taking it
> in turns to go to work). I'd say go ahead and pull it into kvm if there
> aren't any conflicts. No need to wait for us.

I pulled it into the arm64 for-next/core as well (to be pushed out later
today).

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 04:51:40PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 16:47, Andrew Jones wrote:
> >>> -#define KVM_REQ_VCPU_EXIT8
> >>> +#define KVM_REQ_PAUSE8
> >> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
> >> something along those lines?
> > Sounds good to me.  Should I even do something like
> > 
> >  #define KVM_REQ_ARCH_BASE 8
> > 
> >  #define KVM_ARCH_REQ(bit) ({ \
> >  BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
> 
> Please make this 32 so that we don't fail on 32-bit machines.
> 
> or even
> 
> BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);
> 
> in case someone is crazy enough to pass a negative value!

Will do.

Thanks,
drew

> 
> Paolo
> 
> >  ((bit) + KVM_REQ_ARCH_BASE); \
> >  })
> > 
> >  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
> > 
> > or would that be overkill?  Also, whether we switch to just the base
> > define, or the macro, I guess it would be good to do for all
> > architectures.
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 05:34:01PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:52PM +0200, Andrew Jones wrote:
> > Make sure we don't leave vcpu requests we don't intend to
> > handle later set in the request bitmap. If we don't clear
> > them, then kvm_request_pending() may return true when we
> > don't want it to.
> > 
> > Signed-off-by: Andrew Jones 
> > Acked-by: Christoffer Dall 
> > ---
> >  arch/arm/kvm/handle_exit.c   | 1 +
> >  arch/arm/kvm/psci.c  | 1 +
> >  arch/arm64/kvm/handle_exit.c | 1 +
> >  3 files changed, 3 insertions(+)
> > 
> > diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
> > index 96af65a30d78..ffb2406e5905 100644
> > --- a/arch/arm/kvm/handle_exit.c
> > +++ b/arch/arm/kvm/handle_exit.c
> > @@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct 
> > kvm_run *run)
> > trace_kvm_wfx(*vcpu_pc(vcpu), false);
> > vcpu->stat.wfi_exit_stat++;
> > kvm_vcpu_block(vcpu);
> > +   clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
> 
> I actually don't understand the idea behind KVM_REQ_UNHALT?
> 
> It seems there's a semantic difference that architectures should adhere
> by when returning from kvm_vcpu_block() with or without KVM_REQ_UNHALT
> set (i.e. if the vcpu was runnable when kvm_vcpu_check_blocK() was
> called?) - can you explain what the deal is?  Perhaps that belongs in
> the documentation patch.

Yup, will address this in the doc patch.

> 
> > }
> >  
> > kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > index c2b131527a64..82fe7eb5b6a7 100644
> > --- a/arch/arm/kvm/psci.c
> > +++ b/arch/arm/kvm/psci.c
> > @@ -57,6 +57,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct 
> > kvm_vcpu *vcpu)
> >  * for KVM will preserve the register state.
> >  */
> > kvm_vcpu_block(vcpu);
> > +   clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
> >  
> > return PSCI_RET_SUCCESS;
> >  }
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index fa1b18e364fc..e4937fb2fb89 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct 
> > kvm_run *run)
> > trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
> > vcpu->stat.wfi_exit_stat++;
> > kvm_vcpu_block(vcpu);
> > +   clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
> > }
> >  
> > kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> > -- 
> > 2.9.3
> > 
> 
> Ignoring my comment above, for the content of this patch:
> 
> Acked-by: Christoffer Dall 

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 05:24:03PM +0200, Christoffer Dall wrote:
> Hi Drew,
> 
> On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
> > Signed-off-by: Andrew Jones 
> > ---
> >  Documentation/virtual/kvm/vcpu-requests.rst | 114 
> > 
> >  1 file changed, 114 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> > 
> > diff --git a/Documentation/virtual/kvm/vcpu-requests.rst 
> > b/Documentation/virtual/kvm/vcpu-requests.rst
> > new file mode 100644
> > index ..ea4a966d5c8a
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> > @@ -0,0 +1,114 @@
> > +=
> > +KVM VCPU Requests
> > +=
> > +
> > +Overview
> > +
> > +
> > +KVM supports an internal API enabling threads to request a VCPU thread to
> > +perform some activity.  For example, a thread may request a VCPU to flush
> > +its TLB with a VCPU request.  The API consists of only four calls::
> > +
> > +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> > +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /* Check if any requests are pending for VCPU @vcpu. */
> > +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> > +
> > +  /* Make request @req of VCPU @vcpu. */
> > +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> > +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> > +
> > +Typically a requester wants the VCPU to perform the activity as soon
> > +as possible after making the request.  This means most requests,
> > +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> > +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> > +into it.
> > +
> > +VCPU Kicks
> > +--
> > +
> > +A VCPU kick does one of three things:
> > +
> > + 1) wakes a sleeping VCPU (which sleeps outside guest mode).
> 
> You could clarify this to say that a sleeping VCPU is a VCPU thread
> which is not runnable and placed on waitqueue, and waking it makes
> the thread runnable again.
> 
> > + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> > +out.
> > + 3) nothing, when the VCPU is already outside guest mode and not sleeping.
> > +
> > +VCPU Request Internals
> > +==
> > +
> > +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> > +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> > +may also be used.  The first 8 bits are reserved for architecture
> > +independent requests, all additional bits are available for architecture
> > +dependent requests.
> 
> Should we explain the ones that are generically defined and how they're
> supposed to be used?  For example, we don't use them on ARM, and I don't
> think I understand why another thread would ever make a PENDING_TIMER
> request on a vcpu?

Yes, I agree the general requests should be described.  I'll have to
figure out how :-)  Describing KVM_REQ_UNHALT will likely lead to a
subsection on kvm_vcpu_block(), as you bring up below.

> 
> > +
> > +VCPU Requests with Associated State
> > +===
> > +
> > +Requesters that want the requested VCPU to handle new state need to ensure
> > +the state is observable to the requested VCPU thread's CPU at the time the
> 
> nit: need to ensure that the newly written state is observable ... by
> the time it observed the request.
> 
> > +CPU observes the request.  This means a write memory barrier should be
>  ^^^
>must
> 
> > +insert between the preparation of the state and the write of the VCPU
> ^^^
>inserted
> 
> I would rephrase this as: '... after writing the new state to memory and
> before setting the VCPU request bit.'
> 
> 
> > +request bitmap.  Additionally, on the requested VCPU thread's side, a
> > +corresponding read barrier should be issued after reading the request bit
> ^^^   ^^^
>  must  inserted (for consistency)
> 
> 
> 
> > +and before proceeding to use the state associated with it.  See the kernel
> ^^^^
>  readnew
> 
> 
> > +memory barrier documentation [2].
> 
> I think it would be great if this document explains if this is currently
> taken care of by the API you explain above or if there are cases where
> people have to explicitly insert these barriers, and in that case, which
> barriers they should use (if we know at this point already).

Will do.  The current API does take care of it.  I'll state that.  I'd
have to grep around to see if there are any non-API users that also need
barriers, but as they could change, I probably wouldn't want to call them
out i

Re: [PATCH v2 1/9] KVM: add kvm_request_pending

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 05:30:14PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
> > From: Radim Krčmář 
> > 
> > A first step in vcpu->requests encapsulation.
> 
> Could we have a note here on why we need to access vcpu->requests using
> READ_ONCE now?

Sure, maybe we should put the note as a comment above the read in
kvm_request_pending().  Something like

 /*
  * vcpu->requests reads may appear in sequences that have strict
  * data or control dependencies.  Use READ_ONCE() to ensure the
  * compiler does not do anything that breaks the required ordering.
  */

Radim?

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 18:04, Christoffer Dall wrote:
>> For pause, only the requester should do the clearing.

This suggests that maybe this should not be a request.  The request
would be just the need to act on a GIC command, exactly as before this patch.

What I don't understand is:

>> With this patch, while the vcpu will still initially enter
>> the guest, it will exit immediately due to the IPI sent by the vcpu
>> kick issued after making the vcpu request.

Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?

So this:

+   vcpu->arch.power_off || kvm_request_pending(vcpu)) {
+   WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);

is the crux of the fix, you can keep using vcpu->arch.pause.

By the way, vcpu->arch.power_off can go away from this "if" too because
KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
The earlier check is enough:

 if (vcpu->arch.power_off || vcpu->arch.pause)
 vcpu_sleep(vcpu);


>> +/*
>> + * Indicate we're in guest mode now, before doing a final
>> + * check for pending vcpu requests. The general barrier
>> + * pairs with the one in kvm_arch_vcpu_should_kick().
>> + * Please see the comment there for more details.
>> + */
>> +WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
>> +smp_mb();
> 
> There are two changes here:
> 
> there's a change from a normal write to a WRITE_ONCE and there's also a
> change to that adds a memory barrier.  I feel like I'd like to know if
> these are tied together or two separate cleanups.  I also wonder if we
> could split out more general changes from the pause thing to have a
> better log of why we changed the run loop?

You probably should just use smp_store_mb here.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 09:27:46AM +0200, Andrew Jones wrote:
> On Mon, Apr 03, 2017 at 05:28:45PM +0200, Christoffer Dall wrote:
> > Hi Drew,
> > 
> > On Fri, Mar 31, 2017 at 06:06:49PM +0200, Andrew Jones wrote:
> > > This series fixes some hard to produce races by introducing the use of
> > > vcpu requests.  It also fixes a couple easier to produce races, ones
> > > that have been produced with the PSCI kvm-unit-test test.  The easy two
> > > are addressed in two different ways: the first takes advantage of
> > > power_off having been changed to a vcpu request, the second caches vcpu
> > > MPIDRs in order to avoid extracting them from sys_regs.  I've tested the
> > > series on a Mustang and a ThunderX and compile-tested the ARM bits.
> > > 
> > > Patch 2/9 adds documentation, as, at least for me, understanding vcpu
> > > request interplay with vcpu kicks and vcpu mode and the memory barriers
> > > that interplay implies, is exhausting.  Hopefully the document is useful
> > > to others.  I'm not married to it though, so it can be deferred/dropped
> > > as people like...
> > 
> > Sounds helpful, I'll have a look.
> > 
> > > 
> > > v2:
> > >   - No longer based on Radim's vcpu request API rework[1], except for
> > > including "add kvm_request_pending" as patch 1/9 [drew]
> > 
> > I lost track here; did those patches get merged or dropped and why are
> > we not basing this work on them anymore, and should patch 1/9 be applied
> > here or is it expected to land in the KVM tree via some other path?
> 
> I think Radim still wants to rework the API, but, as his work doesn't
> provide fixes or functional changes, his timeline may not be the same
> as for this series.  He also wants to expand his rework to add API
> that includes kicking with requesting.  I'm not sure how all that will
> look yet, so, in the end, I decided I might as well just use the current
> API for now.  kvm_request_pending() was too nice an addition to drop
> though.

Makes sense, thanks for the explanation.

> 
> > 
> > >   - Added vcpu request documentation [drew]
> > >   - Dropped the introduction of user settable MPIDRs [Christoffer]
> > >   - Added vcpu requests to all request-less vcpu kicks [Christoffer]
> > > 
> > 
> > Didn't we also have an issue with a missing barrier if the cmpxchg
> > operation doesn't succeed?  Did that fall though the cracks or is it
> > just missing in the changelog?
> 
> Just missing from the changelog. Sorry about that.

No worries.

> 
>   - Ensure we have a read barrier (or equivalent) prior to issuing the
> cmpxchg in kvm_vcpu_exiting_guest_mode(), as a failed cmpxchg does
> not guarantee any barrier [Christoffer]

Thanks for adding this, although I'm not able to convince myself that we
got all the detailed aspects of this correct, just yet, but hopefully
some of the questions I've asked on the individual patches can improve
this.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> This not only ensures visibility of changes to pause by using
> atomic ops, but also plugs a small race where a vcpu could get its
> pause state enabled just after its last check before entering the
> guest. With this patch, while the vcpu will still initially enter
> the guest, it will exit immediately due to the IPI sent by the vcpu
> kick issued after making the vcpu request.
> 
> We use bitops, rather than kvm_make/check_request(), because we
> don't need the barriers they provide,

why not?

> nor do we want the side-effect
> of kvm_check_request() clearing the request. For pause, only the
> requester should do the clearing.
> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_host.h   |  5 +
>  arch/arm/kvm/arm.c| 45 
> +++
>  arch/arm64/include/asm/kvm_host.h |  5 +
>  3 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 31ee468ce667..52c25536d254 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -45,7 +45,7 @@
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  #endif
>  
> -#define KVM_REQ_VCPU_EXIT8
> +#define KVM_REQ_PAUSE8
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
>   /* vcpu power-off state */
>   bool power_off;
>  
> -  /* Don't run the guest (internal implementation need) */
> - bool pause;
> -
>   /* IO related fields */
>   struct kvm_decode mmio_decode;
>  
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 314eb6abe1ff..f3bfbb5f3d96 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
>  
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>  {
> + /*
> +  * If we return true from this function, then it means the vcpu is
> +  * either in guest mode, or has already indicated that it's in guest
> +  * mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> +  * and must be done before the final kvm_request_pending() read. It's
> +  * important that the observability of that order be enforced and that
> +  * the request receiving CPU can observe any new request before the
> +  * requester issues a kick. Thus, the general barrier below pairs with
> +  * the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> +  * write to ->mode and the final request pending read.
> +  */

I am having a hard time understanding this comment.  For example, I
don't understand the difference between 'is either in guest mode or has
already indicated it's in guest mode'.  Which case is which again, and
how are we checking for two cases below?

Also, the stuff about observability of an order is hard to follow, and
the comment assumes the reader is thinking about the specific race when
entering the guest.

I think we should focus on getting the documentation in place, refer to
the documentation from here, and be much more brief and say something
like:

/*
 * The memory barrier below pairs with the barrier in
 * kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
 * and reading vcpu->requests before entering the guest.
 *
 * Ensures that the VCPU thread's CPU can observe changes to
 * vcpu->requests written prior to calling this function before
 * it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
 * ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
 * only if the VCPU thread's CPU could observe writes to
 * vcpu->requests from this CPU.
 /

Is this correct?  I'm not really sure anymore?

There's also the obvious fact that we're adding this memory barrier
inside a funciton that checks if we should kick a vcpu, and there's no
documentation that says that this is always called in association with
setting a request, is there?

I finally don't undertand why this would be a requirement only on ARM?

> + smp_mb();
>   return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
>  }
>  
> @@ -404,7 +416,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
>   return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> - && !v->arch.power_off && !v->arch.pause);
> + && !v->arch.power_off
> + && !test_bit(KVM_REQ_PAUSE, &v->requests));
>  }
>  
>  /* Just ensure a guest exit from a particular CPU */
> @@ -535,17 +548,12 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
>  
>  void kvm_arm_halt_guest(struct kvm *kvm)
>  {
> - int i;
> - struct kvm_vcpu *vcpu;
> -
> - kvm_for_each_vcpu(i, vcpu, kvm

Re: [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:52PM +0200, Andrew Jones wrote:
> Make sure we don't leave vcpu requests we don't intend to
> handle later set in the request bitmap. If we don't clear
> them, then kvm_request_pending() may return true when we
> don't want it to.
> 
> Signed-off-by: Andrew Jones 
> Acked-by: Christoffer Dall 
> ---
>  arch/arm/kvm/handle_exit.c   | 1 +
>  arch/arm/kvm/psci.c  | 1 +
>  arch/arm64/kvm/handle_exit.c | 1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
> index 96af65a30d78..ffb2406e5905 100644
> --- a/arch/arm/kvm/handle_exit.c
> +++ b/arch/arm/kvm/handle_exit.c
> @@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   trace_kvm_wfx(*vcpu_pc(vcpu), false);
>   vcpu->stat.wfi_exit_stat++;
>   kvm_vcpu_block(vcpu);
> + clear_bit(KVM_REQ_UNHALT, &vcpu->requests);

I actually don't understand the idea behind KVM_REQ_UNHALT?

It seems there's a semantic difference that architectures should adhere
by when returning from kvm_vcpu_block() with or without KVM_REQ_UNHALT
set (i.e. if the vcpu was runnable when kvm_vcpu_check_blocK() was
called?) - can you explain what the deal is?  Perhaps that belongs in
the documentation patch.

>   }
>  
>   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index c2b131527a64..82fe7eb5b6a7 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -57,6 +57,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu 
> *vcpu)
>* for KVM will preserve the register state.
>*/
>   kvm_vcpu_block(vcpu);
> + clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>  
>   return PSCI_RET_SUCCESS;
>  }
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index fa1b18e364fc..e4937fb2fb89 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
>   vcpu->stat.wfi_exit_stat++;
>   kvm_vcpu_block(vcpu);
> + clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>   }
>  
>   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> -- 
> 2.9.3
> 

Ignoring my comment above, for the content of this patch:

Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/9] KVM: add kvm_request_pending

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
> From: Radim Krčmář 
> 
> A first step in vcpu->requests encapsulation.

Could we have a note here on why we need to access vcpu->requests using
READ_ONCE now?

Thanks,
-Christoffer

> 
> Signed-off-by: Radim Krčmář 
> Signed-off-by: Andrew Jones 
> ---
>  arch/mips/kvm/trap_emul.c  | 2 +-
>  arch/powerpc/kvm/booke.c   | 2 +-
>  arch/powerpc/kvm/powerpc.c | 5 ++---
>  arch/s390/kvm/kvm-s390.c   | 2 +-
>  arch/x86/kvm/x86.c | 4 ++--
>  include/linux/kvm_host.h   | 5 +
>  6 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/mips/kvm/trap_emul.c b/arch/mips/kvm/trap_emul.c
> index b1fa53b252ea..9ac8b1d62643 100644
> --- a/arch/mips/kvm/trap_emul.c
> +++ b/arch/mips/kvm/trap_emul.c
> @@ -1029,7 +1029,7 @@ static void kvm_trap_emul_check_requests(struct 
> kvm_vcpu *vcpu, int cpu,
>   struct mm_struct *mm;
>   int i;
>  
> - if (likely(!vcpu->requests))
> + if (likely(!kvm_request_pending(vcpu)))
>   return;
>  
>   if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 0514cbd4e533..65ed6595c9c2 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -682,7 +682,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
>  
>   kvmppc_core_check_exceptions(vcpu);
>  
> - if (vcpu->requests) {
> + if (kvm_request_pending(vcpu)) {
>   /* Exception delivery raised request; start over */
>   return 1;
>   }
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 95c91a9de351..714674ea5be6 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -52,8 +52,7 @@ EXPORT_SYMBOL_GPL(kvmppc_pr_ops);
>  
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
> - return !!(v->arch.pending_exceptions) ||
> -v->requests;
> + return !!(v->arch.pending_exceptions) || kvm_request_pending(v);
>  }
>  
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> @@ -105,7 +104,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>*/
>   smp_mb();
>  
> - if (vcpu->requests) {
> + if (kvm_request_pending(vcpu)) {
>   /* Make sure we process requests preemptable */
>   local_irq_enable();
>   trace_kvm_check_requests(vcpu);
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fd6cd05bb6a7..40ad6c8d082f 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2396,7 +2396,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu 
> *vcpu)
>  {
>  retry:
>   kvm_s390_vcpu_request_handled(vcpu);
> - if (!vcpu->requests)
> + if (!kvm_request_pending(vcpu))
>   return 0;
>   /*
>* We use MMU_RELOAD just to re-arm the ipte notifier for the
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1faf620a6fdc..9714bb230524 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6726,7 +6726,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>   bool req_immediate_exit = false;
>  
> - if (vcpu->requests) {
> + if (kvm_request_pending(vcpu)) {
>   if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
>   kvm_mmu_unload(vcpu);
>   if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
> @@ -6890,7 +6890,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   kvm_x86_ops->sync_pir_to_irr(vcpu);
>   }
>  
> - if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> + if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
>   || need_resched() || signal_pending(current)) {
>   vcpu->mode = OUTSIDE_GUEST_MODE;
>   smp_wmb();
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2c14ad9809da..946bf0b3c43c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1085,6 +1085,11 @@ static inline int kvm_ioeventfd(struct kvm *kvm, 
> struct kvm_ioeventfd *args)
>  
>  #endif /* CONFIG_HAVE_KVM_EVENTFD */
>  
> +static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
> +{
> + return READ_ONCE(vcpu->requests);
> +}
> +
>  static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
>  {
>   /*
> -- 
> 2.9.3
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests

2017-04-04 Thread Christoffer Dall
Hi Drew,

On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
> Signed-off-by: Andrew Jones 
> ---
>  Documentation/virtual/kvm/vcpu-requests.rst | 114 
> 
>  1 file changed, 114 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> 
> diff --git a/Documentation/virtual/kvm/vcpu-requests.rst 
> b/Documentation/virtual/kvm/vcpu-requests.rst
> new file mode 100644
> index ..ea4a966d5c8a
> --- /dev/null
> +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> @@ -0,0 +1,114 @@
> +=
> +KVM VCPU Requests
> +=
> +
> +Overview
> +
> +
> +KVM supports an internal API enabling threads to request a VCPU thread to
> +perform some activity.  For example, a thread may request a VCPU to flush
> +its TLB with a VCPU request.  The API consists of only four calls::
> +
> +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /* Check if any requests are pending for VCPU @vcpu. */
> +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> +
> +  /* Make request @req of VCPU @vcpu. */
> +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> +
> +Typically a requester wants the VCPU to perform the activity as soon
> +as possible after making the request.  This means most requests,
> +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> +into it.
> +
> +VCPU Kicks
> +--
> +
> +A VCPU kick does one of three things:
> +
> + 1) wakes a sleeping VCPU (which sleeps outside guest mode).

You could clarify this to say that a sleeping VCPU is a VCPU thread
which is not runnable and placed on waitqueue, and waking it makes
the thread runnable again.

> + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> +out.
> + 3) nothing, when the VCPU is already outside guest mode and not sleeping.
> +
> +VCPU Request Internals
> +==
> +
> +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> +may also be used.  The first 8 bits are reserved for architecture
> +independent requests, all additional bits are available for architecture
> +dependent requests.

Should we explain the ones that are generically defined and how they're
supposed to be used?  For example, we don't use them on ARM, and I don't
think I understand why another thread would ever make a PENDING_TIMER
request on a vcpu?

> +
> +VCPU Requests with Associated State
> +===
> +
> +Requesters that want the requested VCPU to handle new state need to ensure
> +the state is observable to the requested VCPU thread's CPU at the time the

nit: need to ensure that the newly written state is observable ... by
the time it observed the request.

> +CPU observes the request.  This means a write memory barrier should be
 ^^^
 must

> +insert between the preparation of the state and the write of the VCPU
^^^
   inserted

I would rephrase this as: '... after writing the new state to memory and
before setting the VCPU request bit.'


> +request bitmap.  Additionally, on the requested VCPU thread's side, a
> +corresponding read barrier should be issued after reading the request bit
^^^   ^^^
   must  inserted (for consistency)



> +and before proceeding to use the state associated with it.  See the kernel
^^^^
   readnew


> +memory barrier documentation [2].

I think it would be great if this document explains if this is currently
taken care of by the API you explain above or if there are cases where
people have to explicitly insert these barriers, and in that case, which
barriers they should use (if we know at this point already).

> +
> +VCPU Requests and Guest Mode
> +
> +

I feel like an intro about the overall goal here is missing.  How about
something like this:

  When making requests to VCPUs, we want to avoid the receiving VCPU
  executing inside the guest for an arbitrary long time without handling
  the request.  The way we prevent this from happening is by keeping
  track of when a VCPU is running and sending an IPI to the physical CPU
  running the VCPU when that is the case.  However, each architecture
  implementation of KVM must take great care to ensure that requests are
  not missed when a VCPU stops running at the same time when a request
  is received.

Also, I'm not sure what th

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Marc Zyngier
On 04/04/17 15:51, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 16:47, Andrew Jones wrote:
 -#define KVM_REQ_VCPU_EXIT 8
 +#define KVM_REQ_PAUSE 8
>>> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
>>> something along those lines?
>> Sounds good to me.  Should I even do something like
>>
>>  #define KVM_REQ_ARCH_BASE 8
>>
>>  #define KVM_ARCH_REQ(bit) ({ \
>>  BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
> 
> Please make this 32 so that we don't fail on 32-bit machines.
> 
> or even
> 
> BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);
> 
> in case someone is crazy enough to pass a negative value!
> 
> Paolo
> 
>>  ((bit) + KVM_REQ_ARCH_BASE); \
>>  })
>>
>>  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
>>
>> or would that be overkill?  Also, whether we switch to just the base
>> define, or the macro, I guess it would be good to do for all
>> architectures.
> 

Both suggestions look good to me.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 16:47, Andrew Jones wrote:
>>> -#define KVM_REQ_VCPU_EXIT  8
>>> +#define KVM_REQ_PAUSE  8
>> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
>> something along those lines?
> Sounds good to me.  Should I even do something like
> 
>  #define KVM_REQ_ARCH_BASE 8
> 
>  #define KVM_ARCH_REQ(bit) ({ \
>  BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \

Please make this 32 so that we don't fail on 32-bit machines.

or even

BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);

in case someone is crazy enough to pass a negative value!

Paolo

>  ((bit) + KVM_REQ_ARCH_BASE); \
>  })
> 
>  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
> 
> or would that be overkill?  Also, whether we switch to just the base
> define, or the macro, I guess it would be good to do for all
> architectures.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 02:39:19PM +0100, Marc Zyngier wrote:
> On 31/03/17 17:06, Andrew Jones wrote:
> > This not only ensures visibility of changes to pause by using
> > atomic ops, but also plugs a small race where a vcpu could get its
> > pause state enabled just after its last check before entering the
> > guest. With this patch, while the vcpu will still initially enter
> > the guest, it will exit immediately due to the IPI sent by the vcpu
> > kick issued after making the vcpu request.
> > 
> > We use bitops, rather than kvm_make/check_request(), because we
> > don't need the barriers they provide, nor do we want the side-effect
> > of kvm_check_request() clearing the request. For pause, only the
> > requester should do the clearing.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  5 +
> >  arch/arm/kvm/arm.c| 45 
> > +++
> >  arch/arm64/include/asm/kvm_host.h |  5 +
> >  3 files changed, 33 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index 31ee468ce667..52c25536d254 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -45,7 +45,7 @@
> >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> >  #endif
> >  
> > -#define KVM_REQ_VCPU_EXIT  8
> > +#define KVM_REQ_PAUSE  8
> 
> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
> something along those lines?

Sounds good to me.  Should I even do something like

 #define KVM_REQ_ARCH_BASE 8

 #define KVM_ARCH_REQ(bit) ({ \
 BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
 ((bit) + KVM_REQ_ARCH_BASE); \
 })

 #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)

or would that be overkill?  Also, whether we switch to just the base
define, or the macro, I guess it would be good to do for all
architectures.

Thanks,
drew

> 
> I've otherwise started hammering this series over a number of systems,
> looking good so far.
> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Marc Zyngier
On 31/03/17 17:06, Andrew Jones wrote:
> This not only ensures visibility of changes to pause by using
> atomic ops, but also plugs a small race where a vcpu could get its
> pause state enabled just after its last check before entering the
> guest. With this patch, while the vcpu will still initially enter
> the guest, it will exit immediately due to the IPI sent by the vcpu
> kick issued after making the vcpu request.
> 
> We use bitops, rather than kvm_make/check_request(), because we
> don't need the barriers they provide, nor do we want the side-effect
> of kvm_check_request() clearing the request. For pause, only the
> requester should do the clearing.
> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_host.h   |  5 +
>  arch/arm/kvm/arm.c| 45 
> +++
>  arch/arm64/include/asm/kvm_host.h |  5 +
>  3 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 31ee468ce667..52c25536d254 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -45,7 +45,7 @@
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  #endif
>  
> -#define KVM_REQ_VCPU_EXIT8
> +#define KVM_REQ_PAUSE8

Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
something along those lines?

I've otherwise started hammering this series over a number of systems,
looking good so far.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 11:35:35AM +0100, Suzuki K Poulose wrote:
> Hi Christoffer,
> 
> On 04/04/17 11:13, Christoffer Dall wrote:
> >Hi Suzuki,
> >
> >On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> >>In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> >>unmap_stage2_range() on the entire memory range for the guest. This could
> >>cause problems with other callers (e.g, munmap on a memslot) trying to
> >>unmap a range. And since we have to unmap the entire Guest memory range
> >>holding a spinlock, make sure we yield the lock if necessary, after we
> >>unmap each PUD range.
> >>
> >>Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> >>Cc: sta...@vger.kernel.org # v3.10+
> >>Cc: Paolo Bonzini 
> >>Cc: Marc Zyngier 
> >>Cc: Christoffer Dall 
> >>Cc: Mark Rutland 
> >>Signed-off-by: Suzuki K Poulose 
> >>[ Avoid vCPU starvation and lockup detector warnings ]
> >>Signed-off-by: Marc Zyngier 
> >>Signed-off-by: Suzuki K Poulose 
> >>
> >
> >This unfortunately fails to build on 32-bit ARM, and I also think we
> >intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.
> 
> Sorry about that, I didn't test the patch with arm32. I am fine the
> patch below. And I agree that the name change does make things more
> readable. See below for a hunk that I posted to the kbuild report.
> 
> >
> >How about adding this to your patch (which includes a rename of
> >S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> >first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> >of address space mapped by a single entry in the same table):
> >
> >diff --git a/arch/arm/include/asm/stage2_pgtable.h 
> >b/arch/arm/include/asm/stage2_pgtable.h
> >index 460d616..c997f2d 100644
> >--- a/arch/arm/include/asm/stage2_pgtable.h
> >+++ b/arch/arm/include/asm/stage2_pgtable.h
> >@@ -35,10 +35,13 @@
> >
> > #define stage2_pud_huge(pud)pud_huge(pud)
> >
> >+#define S2_PGDIR_SIZE   PGDIR_SIZE
> >+#define S2_PGDIR_MASK   PGDIR_MASK
> >+
> > /* Open coded p*d_addr_end that can deal with 64bit addresses */
> > static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t 
> > end)
> > {
> >-phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> >+phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
> >
> > return (boundary - 1 < end - 1) ? boundary : end;
> > }
> >diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >index db94f3a..6e79a4c 100644
> >--- a/arch/arm/kvm/mmu.c
> >+++ b/arch/arm/kvm/mmu.c
> >@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
> > static unsigned long hyp_idmap_end;
> > static phys_addr_t hyp_idmap_vector;
> >
> >-#define S2_PGD_SIZE (PTRS_PER_S2_PGD * sizeof(pgd_t))
> >+#define S2_PGD_TABLE_SIZE   (PTRS_PER_S2_PGD * sizeof(pgd_t))
> > #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
> >
> > #define KVM_S2PTE_FLAG_IS_IOMAP (1UL << 0)
> >@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, 
> >phys_addr_t start, u64 size)
> >  * If the range is too large, release the kvm->mmu_lock
> >  * to prevent starvation and lockup detector warnings.
> >  */
> >-if (size > S2_PUD_SIZE)
> >+if (size > S2_PGDIR_SIZE)
> > cond_resched_lock(&kvm->mmu_lock);
> > next = stage2_pgd_addr_end(addr, end);
> > if (!stage2_pgd_none(*pgd))
> >@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
> > }
> >
> > /* Allocate the HW PGD, making sure that each page gets its own 
> > refcount */
> >-pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> >+pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
> > if (!pgd)
> > return -ENOMEM;
> >
> >@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> > spin_unlock(&kvm->mmu_lock);
> >
> > /* Free the HW pgd, one page at a time */
> >-free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> >+free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
> > kvm->arch.pgd = NULL;
> > }
> >
> 
> Btw, I have a different hunk to solve the problem, posted to the kbuild
> report. I will post it here for the sake of capturing the discussion in
> one place. The following hunk on top of the patch, changes the lock
> release after we process one PGDIR entry. As for the first time
> we enter the loop we haven't done much with the lock held, hence it may make
> sense to do it after the first round and we have more work to do.
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..582a972 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, 
> phys_addr_t start, u64 size)
> assert_spin_locked(&kvm->mmu_lock);
> pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  

Re: [PATCH v5 00/34] arm/arm64: KVM: Rework the hyp-stub API

2017-04-04 Thread Christoffer Dall
Hi Marc,

On Mon, Apr 03, 2017 at 07:37:33PM +0100, Marc Zyngier wrote:
> As noticed by RMK in this thread[1], the hyp-stub API on 32bit ARM
> could do with some TLC (it cannot perform a soft-restart at HYP, and
> has holes in the hyp-stub support in a number of places). In general,
> it would be desirable for the 32bit behaviour to align on 64bit, if
> only to ease maintenance.
> 
> This series implements the following:
> - Add HVC_[GS]ET_VECTORS and HVC_SOFT_RESTART to the 32bit code
> - Add HVC_RESET_VECTORS to both arm and arm64, removing the need for
>   __hyp_reset_vectors
> - Implement add the stub entry points in the KVM init code, which
>   didn't implement any so far
> - Convert the HYP code to use the init code stubs directly
> - Some general cleanup as a result of these changes (which includes
>   killing HVC_GET_VECTORS)
> - Add some API documentation that covers the above
> 
> Patches 14 to 16 would be better squashed into 12 and 13, but I've
> kept them separate so that I can take the blame for everything I've
> broken.
> 
> I've tested this on arm (Cubietruck, Jetson TK1) and arm64 (Seattle),
> both as host and guest. Keerthy has been kind enough to test the 32bit
> code on DRA7-EVM, AM57XX-EVM and KEYSTONE-K2E-EVM.
> 
> [1] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-December/473472.html
> 
> * From v4:
>   - Added a standard return value on stub hypercall failure
>   - Zero r0/x0 on successful hypercall
>   - Make 32bit ARM resistant to kvm_reboot while executing a guest
>   - Update documentation to reflect the calling convention expectations
>   - Added Acks from Catalin
>   - Rebased on 4.11-rc5

Thanks for doing the changes.

I have applied your patches to kvmarm/queue.

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvmarm:master 1/3] arch/arm/kvm/mmu.c:302:14: error: 'S2_PUD_SIZE' undeclared

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 11:28:09AM +0100, Marc Zyngier wrote:
> On 04/04/17 11:14, Suzuki K Poulose wrote:
> > On 03/04/17 22:15, kbuild test robot wrote:
> >> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> >> master
> >> head:   1f1c45c6f66a586ca420ca02cbd93a35690394f9
> >> commit: f9d9eb7f7a2c7e388861fe1cdb253f63e63555fe [1/3] kvm: arm/arm64: Fix 
> >> locking for kvm_free_stage2_pgd
> >> config: arm-axm55xx_defconfig (attached as .config)
> >> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
> >> reproduce:
> >> wget 
> >> https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross 
> >> -O ~/bin/make.cross
> >> chmod +x ~/bin/make.cross
> >> git checkout f9d9eb7f7a2c7e388861fe1cdb253f63e63555fe
> >> # save the attached .config to linux build tree
> >> make.cross ARCH=arm
> >>
> >> All errors (new ones prefixed by >>):
> >>
> >>arch/arm/kvm/mmu.c: In function 'unmap_stage2_range':
>  arch/arm/kvm/mmu.c:302:14: error: 'S2_PUD_SIZE' undeclared (first use in 
>  this function)
> >>   if (size > S2_PUD_SIZE)
> >>  ^~~
> > 
> > Thanks kbuild for catching this one !
> > 
> >>arch/arm/kvm/mmu.c:302:14: note: each undeclared identifier is reported 
> >> only once for each function it appears in
> >>
> >> vim +/S2_PUD_SIZE +302 arch/arm/kvm/mmu.c
> >>
> >>296 pgd = kvm->arch.pgd + stage2_pgd_index(addr);
> >>297 do {
> >>298 /*
> >>299  * If the range is too large, release the 
> >> kvm->mmu_lock
> >>300  * to prevent starvation and lockup detector 
> >> warnings.
> >>301  */
> >>  > 302 if (size > S2_PUD_SIZE)
> >>303 cond_resched_lock(&kvm->mmu_lock);
> >>304 next = stage2_pgd_addr_end(addr, end);
> >>305 if (!stage2_pgd_none(*pgd))
> >>
> > 
> > 
> > Marc, Christoffer,
> > 
> > Ah! I didn't test this on arm32. We have two options :
> > 
> > 1) Define S2_P{U,M}_SIZE for arm32 in asm/stage2_pgtable.h
> > 
> > or,
> > 
> > 2)  use the following hunk on top of the patch, which changes the lock
> > release after we process one PGDIR entry. As for the first time we enter
> > the loop we haven't done much with the lock held, hence it may make
> > sense to do it after the first round and we have more work to do.
> > 
> > Let me know what you think
> > 
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index db94f3a..582a972 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, 
> > phys_addr_t start, u64 size)
> >  assert_spin_locked(&kvm->mmu_lock);
> >  pgd = kvm->arch.pgd + stage2_pgd_index(addr);
> >  do {
> > +   next = stage2_pgd_addr_end(addr, end);
> > +   if (!stage2_pgd_none(*pgd))
> > +   unmap_stage2_puds(kvm, pgd, addr, next);
> >  /*
> >   * If the range is too large, release the kvm->mmu_lock
> >   * to prevent starvation and lockup detector warnings.
> >   */
> > -   if (size > S2_PUD_SIZE)
> > +   if (next != end)
> >  cond_resched_lock(&kvm->mmu_lock);
> > -   next = stage2_pgd_addr_end(addr, end);
> > -   if (!stage2_pgd_none(*pgd))
> > -   unmap_stage2_puds(kvm, pgd, addr, next);
> >  } while (pgd++, addr = next, addr != end);
> >   }
> 
> Yup, I quite like this last option, as it doesn't rely on a particular
> size (or just implicitly that of the PGD). Can you respin this?
> 

Agreed, I prefer this over my suggestion (sent as a reply to the
original patch).

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-04-04 Thread Suzuki K Poulose

Hi Christoffer,

On 04/04/17 11:13, Christoffer Dall wrote:

Hi Suzuki,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:

In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range. And since we have to unmap the entire Guest memory range
holding a spinlock, make sure we yield the lock if necessary, after we
unmap each PUD range.

Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Cc: sta...@vger.kernel.org # v3.10+
Cc: Paolo Bonzini 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 
[ Avoid vCPU starvation and lockup detector warnings ]
Signed-off-by: Marc Zyngier 
Signed-off-by: Suzuki K Poulose 



This unfortunately fails to build on 32-bit ARM, and I also think we
intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.


Sorry about that, I didn't test the patch with arm32. I am fine the
patch below. And I agree that the name change does make things more
readable. See below for a hunk that I posted to the kbuild report.



How about adding this to your patch (which includes a rename of
S2_PGD_SIZE which is horribly confusing as it indicates the size of the
first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
of address space mapped by a single entry in the same table):

diff --git a/arch/arm/include/asm/stage2_pgtable.h 
b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..c997f2d 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -35,10 +35,13 @@

 #define stage2_pud_huge(pud)   pud_huge(pud)

+#define S2_PGDIR_SIZE  PGDIR_SIZE
+#define S2_PGDIR_MASK  PGDIR_MASK
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t 
end)
 {
-   phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+   phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;

return (boundary - 1 < end - 1) ? boundary : end;
 }
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..6e79a4c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;

-#define S2_PGD_SIZE(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_TABLE_SIZE  (PTRS_PER_S2_PGD * sizeof(pgd_t))
 #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))

 #define KVM_S2PTE_FLAG_IS_IOMAP(1UL << 0)
@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t 
start, u64 size)
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
 */
-   if (size > S2_PUD_SIZE)
+   if (size > S2_PGDIR_SIZE)
cond_resched_lock(&kvm->mmu_lock);
next = stage2_pgd_addr_end(addr, end);
if (!stage2_pgd_none(*pgd))
@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
}

/* Allocate the HW PGD, making sure that each page gets its own 
refcount */
-   pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
+   pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
if (!pgd)
return -ENOMEM;

@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
spin_unlock(&kvm->mmu_lock);

/* Free the HW pgd, one page at a time */
-   free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
+   free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
kvm->arch.pgd = NULL;
 }



Btw, I have a different hunk to solve the problem, posted to the kbuild
report. I will post it here for the sake of capturing the discussion in
one place. The following hunk on top of the patch, changes the lock
release after we process one PGDIR entry. As for the first time
we enter the loop we haven't done much with the lock held, hence it may make
sense to do it after the first round and we have more work to do.

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..582a972 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
assert_spin_locked(&kvm->mmu_lock);
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
+   next = stage2_pgd_addr_end(addr, end);
+   if (!stage2_pgd_none(*pgd))
+   unmap_stage2_puds(kvm, pgd, addr, next);
/*
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
  

Re: [kvmarm:master 1/3] arch/arm/kvm/mmu.c:302:14: error: 'S2_PUD_SIZE' undeclared

2017-04-04 Thread Marc Zyngier
On 04/04/17 11:14, Suzuki K Poulose wrote:
> On 03/04/17 22:15, kbuild test robot wrote:
>> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
>> master
>> head:   1f1c45c6f66a586ca420ca02cbd93a35690394f9
>> commit: f9d9eb7f7a2c7e388861fe1cdb253f63e63555fe [1/3] kvm: arm/arm64: Fix 
>> locking for kvm_free_stage2_pgd
>> config: arm-axm55xx_defconfig (attached as .config)
>> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
>> reproduce:
>> wget 
>> https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
>> ~/bin/make.cross
>> chmod +x ~/bin/make.cross
>> git checkout f9d9eb7f7a2c7e388861fe1cdb253f63e63555fe
>> # save the attached .config to linux build tree
>> make.cross ARCH=arm
>>
>> All errors (new ones prefixed by >>):
>>
>>arch/arm/kvm/mmu.c: In function 'unmap_stage2_range':
 arch/arm/kvm/mmu.c:302:14: error: 'S2_PUD_SIZE' undeclared (first use in 
 this function)
>>   if (size > S2_PUD_SIZE)
>>  ^~~
> 
> Thanks kbuild for catching this one !
> 
>>arch/arm/kvm/mmu.c:302:14: note: each undeclared identifier is reported 
>> only once for each function it appears in
>>
>> vim +/S2_PUD_SIZE +302 arch/arm/kvm/mmu.c
>>
>>296   pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>297   do {
>>298   /*
>>299* If the range is too large, release the 
>> kvm->mmu_lock
>>300* to prevent starvation and lockup detector 
>> warnings.
>>301*/
>>  > 302   if (size > S2_PUD_SIZE)
>>303   cond_resched_lock(&kvm->mmu_lock);
>>304   next = stage2_pgd_addr_end(addr, end);
>>305   if (!stage2_pgd_none(*pgd))
>>
> 
> 
> Marc, Christoffer,
> 
> Ah! I didn't test this on arm32. We have two options :
> 
> 1) Define S2_P{U,M}_SIZE for arm32 in asm/stage2_pgtable.h
> 
> or,
> 
> 2)  use the following hunk on top of the patch, which changes the lock
> release after we process one PGDIR entry. As for the first time we enter
> the loop we haven't done much with the lock held, hence it may make
> sense to do it after the first round and we have more work to do.
> 
> Let me know what you think
> 
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..582a972 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, 
> phys_addr_t start, u64 size)
>  assert_spin_locked(&kvm->mmu_lock);
>  pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  do {
> +   next = stage2_pgd_addr_end(addr, end);
> +   if (!stage2_pgd_none(*pgd))
> +   unmap_stage2_puds(kvm, pgd, addr, next);
>  /*
>   * If the range is too large, release the kvm->mmu_lock
>   * to prevent starvation and lockup detector warnings.
>   */
> -   if (size > S2_PUD_SIZE)
> +   if (next != end)
>  cond_resched_lock(&kvm->mmu_lock);
> -   next = stage2_pgd_addr_end(addr, end);
> -   if (!stage2_pgd_none(*pgd))
> -   unmap_stage2_puds(kvm, pgd, addr, next);
>  } while (pgd++, addr = next, addr != end);
>   }

Yup, I quite like this last option, as it doesn't rely on a particular
size (or just implicitly that of the PGD). Can you respin this?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvmarm:master 1/3] arch/arm/kvm/mmu.c:302:14: error: 'S2_PUD_SIZE' undeclared

2017-04-04 Thread Suzuki K Poulose

On 03/04/17 22:15, kbuild test robot wrote:

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git master
head:   1f1c45c6f66a586ca420ca02cbd93a35690394f9
commit: f9d9eb7f7a2c7e388861fe1cdb253f63e63555fe [1/3] kvm: arm/arm64: Fix 
locking for kvm_free_stage2_pgd
config: arm-axm55xx_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout f9d9eb7f7a2c7e388861fe1cdb253f63e63555fe
# save the attached .config to linux build tree
make.cross ARCH=arm

All errors (new ones prefixed by >>):

   arch/arm/kvm/mmu.c: In function 'unmap_stage2_range':

arch/arm/kvm/mmu.c:302:14: error: 'S2_PUD_SIZE' undeclared (first use in this 
function)

  if (size > S2_PUD_SIZE)
 ^~~


Thanks kbuild for catching this one !


   arch/arm/kvm/mmu.c:302:14: note: each undeclared identifier is reported only 
once for each function it appears in

vim +/S2_PUD_SIZE +302 arch/arm/kvm/mmu.c

   296  pgd = kvm->arch.pgd + stage2_pgd_index(addr);
   297  do {
   298  /*
   299   * If the range is too large, release the kvm->mmu_lock
   300   * to prevent starvation and lockup detector warnings.
   301   */
 > 302   if (size > S2_PUD_SIZE)
   303  cond_resched_lock(&kvm->mmu_lock);
   304  next = stage2_pgd_addr_end(addr, end);
   305  if (!stage2_pgd_none(*pgd))




Marc, Christoffer,

Ah! I didn't test this on arm32. We have two options :

1) Define S2_P{U,M}_SIZE for arm32 in asm/stage2_pgtable.h

or,

2)  use the following hunk on top of the patch, which changes the lock
release after we process one PGDIR entry. As for the first time we enter
the loop we haven't done much with the lock held, hence it may make
sense to do it after the first round and we have more work to do.

Let me know what you think


diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..582a972 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
assert_spin_locked(&kvm->mmu_lock);
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
+   next = stage2_pgd_addr_end(addr, end);
+   if (!stage2_pgd_none(*pgd))
+   unmap_stage2_puds(kvm, pgd, addr, next);
/*
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
 */
-   if (size > S2_PUD_SIZE)
+   if (next != end)
cond_resched_lock(&kvm->mmu_lock);
-   next = stage2_pgd_addr_end(addr, end);
-   if (!stage2_pgd_none(*pgd))
-   unmap_stage2_puds(kvm, pgd, addr, next);
} while (pgd++, addr = next, addr != end);
 }
 


Suzuki


---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-04-04 Thread Christoffer Dall
Hi Suzuki,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> unmap_stage2_range() on the entire memory range for the guest. This could
> cause problems with other callers (e.g, munmap on a memslot) trying to
> unmap a range. And since we have to unmap the entire Guest memory range
> holding a spinlock, make sure we yield the lock if necessary, after we
> unmap each PUD range.
> 
> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Cc: sta...@vger.kernel.org # v3.10+
> Cc: Paolo Bonzini 
> Cc: Marc Zyngier 
> Cc: Christoffer Dall 
> Cc: Mark Rutland 
> Signed-off-by: Suzuki K Poulose 
> [ Avoid vCPU starvation and lockup detector warnings ]
> Signed-off-by: Marc Zyngier 
> Signed-off-by: Suzuki K Poulose 
> 

This unfortunately fails to build on 32-bit ARM, and I also think we
intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.

How about adding this to your patch (which includes a rename of
S2_PGD_SIZE which is horribly confusing as it indicates the size of the
first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
of address space mapped by a single entry in the same table):

diff --git a/arch/arm/include/asm/stage2_pgtable.h 
b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..c997f2d 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -35,10 +35,13 @@
 
 #define stage2_pud_huge(pud)   pud_huge(pud)
 
+#define S2_PGDIR_SIZE  PGDIR_SIZE
+#define S2_PGDIR_MASK  PGDIR_MASK
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t 
end)
 {
-   phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+   phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
 
return (boundary - 1 < end - 1) ? boundary : end;
 }
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..6e79a4c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;
 
-#define S2_PGD_SIZE(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_TABLE_SIZE  (PTRS_PER_S2_PGD * sizeof(pgd_t))
 #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
 
 #define KVM_S2PTE_FLAG_IS_IOMAP(1UL << 0)
@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t 
start, u64 size)
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
 */
-   if (size > S2_PUD_SIZE)
+   if (size > S2_PGDIR_SIZE)
cond_resched_lock(&kvm->mmu_lock);
next = stage2_pgd_addr_end(addr, end);
if (!stage2_pgd_none(*pgd))
@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
}
 
/* Allocate the HW PGD, making sure that each page gets its own 
refcount */
-   pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
+   pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
if (!pgd)
return -ENOMEM;
 
@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
spin_unlock(&kvm->mmu_lock);
 
/* Free the HW pgd, one page at a time */
-   free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
+   free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
kvm->arch.pgd = NULL;
 }
 

Thanks,
-Christoffer

> ---
> Changes since V2:
>  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
>to avoid possible issues like [0]
> 
>  [0] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html
> 
> Changes since V1:
>  - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
>vCPU starvation and lockup detector warnings.
> ---
>  arch/arm/kvm/mmu.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 13b9c1f..db94f3a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, 
> phys_addr_t start, u64 size)
>   phys_addr_t addr = start, end = start + size;
>   phys_addr_t next;
>  
> + assert_spin_locked(&kvm->mmu_lock);
>   pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>   do {
> + /*
> +  * If the range is too large, release the kvm->mmu_lock
> +  * to prevent starvation and lockup detector warnings.
> +  */
> + if (size > S2_PUD_SIZE)
> + cond_resched_lock(&kvm->mmu_lock);
>   next = stage2_pgd_addr_end(addr, end);
>   if (!stage2_pgd_none(*pgd))
> 

Re: [PATCH v4 19/28] ARM: KVM: Allow the main HYP code to use the init hyp stub implementation

2017-04-04 Thread Christoffer Dall
On Mon, Apr 03, 2017 at 06:51:02PM +0100, Marc Zyngier wrote:
> On 03/04/17 18:32, Christoffer Dall wrote:
> > On Fri, Mar 24, 2017 at 03:01:23PM +, Marc Zyngier wrote:
> >> On 24/03/17 14:34, Christoffer Dall wrote:
> >>> On Tue, Mar 21, 2017 at 07:20:49PM +, Marc Zyngier wrote:
>  We now have a full hyp-stub implementation in the KVM init code,
>  but the main KVM code only supports HVC_GET_VECTORS, which is not
>  enough.
> 
>  Instead of reinventing the wheel, let's reuse the init implementation
>  by branching to the idmap page when called with a hyp-stub hypercall.
> 
>  Tested-by: Keerthy 
>  Acked-by: Russell King 
>  Signed-off-by: Marc Zyngier 
>  ---
>   arch/arm/kvm/hyp/hyp-entry.S | 29 -
>   1 file changed, 24 insertions(+), 5 deletions(-)
> 
>  diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
>  index 1f8db7d21fc5..a35baa81fd23 100644
>  --- a/arch/arm/kvm/hyp/hyp-entry.S
>  +++ b/arch/arm/kvm/hyp/hyp-entry.S
>  @@ -126,11 +126,30 @@ hyp_hvc:
>    */
>   pop {r0, r1, r2}
>   
>  -/* Check for __hyp_get_vectors */
>  -cmp r0, #HVC_GET_VECTORS
>  -mrceq   p15, 4, r0, c12, c0, 0  @ get HVBAR
>  -beq 1f
>  +/*
>  + * Check if we have a kernel function, which is guaranteed to be
>  + * bigger than the maximum hyp stub hypercall
>  + */
>  +cmp r0, #HVC_STUB_HCALL_NR
>  +bhs 1f
>   
>  +/*
>  + * Not a kernel function, treat it as a stub hypercall.
>  + * Compute the physical address for __kvm_handle_stub_hvc
>  + * (as the code lives in the idmaped page) and branch there.
>  + * We hijack ip (r12) as a tmp register.
>  + */
> >>>
> >>> How can we just clobber r12 and be sure we don't corrupt the caller?
> >>
> >> r12 (aka ip) is allowed to be clobbered by the linker (used by inserted
> >> code veneers, for example). Given that this is a standalone object, we
> >> can safely assume that r12 has been saved if it was used by the caller.
> >>
> >> Here is what the PCS says:
> >>
> >> "Register r12 (IP) may be used by a linker as a scratch register between
> >> a routine and any subroutine it calls (for details, see
> >> §5.3.1.1, Use of IP by the linker). It can also be used within a routine
> >> to hold intermediate values between subroutine calls."
> >>
> > 
> > So isn't this similar to my comment on the arm64 code, which relies on
> > this being called via a function call, as opposed to directly issuring
> > an HVC via inline assembly?
> 
> Indeed, this is the exact same thing.
> 
> > If so, documenting this limitation/restriction/feature would be nice.
> 
> I've added the following to the documentation:
> 
> "A stub hypercall is allowed to clobber any of the caller-saved
> registers (x0-x18 on arm64, r0-r3 and ip on arm). It is thus recommended
> to use a function call to perform the hypercall."
> 
> Does this work for you?
> 
It very much does.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests

2017-04-04 Thread Andrew Jones
On Mon, Apr 03, 2017 at 05:28:45PM +0200, Christoffer Dall wrote:
> Hi Drew,
> 
> On Fri, Mar 31, 2017 at 06:06:49PM +0200, Andrew Jones wrote:
> > This series fixes some hard to produce races by introducing the use of
> > vcpu requests.  It also fixes a couple easier to produce races, ones
> > that have been produced with the PSCI kvm-unit-test test.  The easy two
> > are addressed in two different ways: the first takes advantage of
> > power_off having been changed to a vcpu request, the second caches vcpu
> > MPIDRs in order to avoid extracting them from sys_regs.  I've tested the
> > series on a Mustang and a ThunderX and compile-tested the ARM bits.
> > 
> > Patch 2/9 adds documentation, as, at least for me, understanding vcpu
> > request interplay with vcpu kicks and vcpu mode and the memory barriers
> > that interplay implies, is exhausting.  Hopefully the document is useful
> > to others.  I'm not married to it though, so it can be deferred/dropped
> > as people like...
> 
> Sounds helpful, I'll have a look.
> 
> > 
> > v2:
> >   - No longer based on Radim's vcpu request API rework[1], except for
> > including "add kvm_request_pending" as patch 1/9 [drew]
> 
> I lost track here; did those patches get merged or dropped and why are
> we not basing this work on them anymore, and should patch 1/9 be applied
> here or is it expected to land in the KVM tree via some other path?

I think Radim still wants to rework the API, but, as his work doesn't
provide fixes or functional changes, his timeline may not be the same
as for this series.  He also wants to expand his rework to add API
that includes kicking with requesting.  I'm not sure how all that will
look yet, so, in the end, I decided I might as well just use the current
API for now.  kvm_request_pending() was too nice an addition to drop
though.

> 
> >   - Added vcpu request documentation [drew]
> >   - Dropped the introduction of user settable MPIDRs [Christoffer]
> >   - Added vcpu requests to all request-less vcpu kicks [Christoffer]
> > 
> 
> Didn't we also have an issue with a missing barrier if the cmpxchg
> operation doesn't succeed?  Did that fall though the cracks or is it
> just missing in the changelog?

Just missing from the changelog. Sorry about that.

  - Ensure we have a read barrier (or equivalent) prior to issuing the
cmpxchg in kvm_vcpu_exiting_guest_mode(), as a failed cmpxchg does
not guarantee any barrier [Christoffer]

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm