irq_work's use of the DEC SPR is racy with guest<->host switch and guest
entry which flips the DEC interrupt to guest, which could lose a host
work interrupt.

This patch closes one race, and attempts to comment another class of
races.

Signed-off-by: Nicholas Piggin <npig...@gmail.com>
---
 arch/powerpc/kvm/book3s_hv.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6f3e3aed99aa..b7a88960ac49 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3704,6 +3704,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
        if (!(vcpu->arch.ctrl & 1))
                mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1);
 
+       /*
+        * When setting DEC, we must always deal with irq_work_raise via NMI vs
+        * setting DEC. The problem occurs right as we switch into guest mode
+        * if a NMI hits and sets pending work and sets DEC, then that will
+        * apply to the guest and not bring us back to the host.
+        *
+        * irq_work_raise could check a flag (or possibly LPCR[HDICE] for
+        * example) and set HDEC to 1? That wouldn't solve the nested hv
+        * case which needs to abort the hcall or zero the time limit.
+        *
+        * XXX: Another day's problem.
+        */
        mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb);
 
        if (kvmhv_on_pseries()) {
@@ -3838,7 +3850,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
        vc->entry_exit_map = 0x101;
        vc->in_guest = 0;
 
-       mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - tb);
+       set_dec_or_work(local_paca->kvm_hstate.dec_expires - tb);
+
        mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso);
 
        kvmhv_load_host_pmu();
-- 
2.23.0

Reply via email to