date:20180215

On Thu, 15 Feb 2018 20:59:29 +,
Jon Masters wrote:
> 
> Hi Marc, all,
> 
> On 02/06/2018 12:56 PM, Marc Zyngier wrote:
> > ARM has recently published a SMC Calling Convention (SMCCC)
> > specification update[1] that provides an optimised calling convention
> > and optional, discoverable support for mitigating CVE-2017-5715. ARM
> > Trusted Firmware (ATF) has already gained such an implementation[2].
> 
> I'm probably just missing something, but does this end up reported
> somewhere conveniently user visible? In particular, if the new SMC is
> *not* provided, does the user end up easily seeing this?

What exactly do you want to make visible to userspace?

If you want the SMC implementation of the CVE workaround to be
reported, it wouldn't be very useful, as the SMC instruction is not
available at EL0. It also only covers part of the mitigation spectrum
(we have cores that implement the mitigation using different
mechanisms).

If what you're after is a userspace visible indication of a mitigation
for this CVE (by whatever method available), then this is still a work
in progress, and relies on this series[1] so that we can properly
handle systems containing a combination of affected and non-affected
CPUs. The plan is to expose the status as part of the sysfs interface,
à la x86 and covering all 3 known vulnerabilities.

Thanks,

M.

[1] https://lkml.org/lkml/2018/2/9/579

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 40/40] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs

We can finally get completely rid of any calls to the VGICv3
save/restore functions when the AP lists are empty on VHE systems.  This
requires carefully factoring out trap configuration from saving and
restoring state, and carefully choosing what to do on the VHE and
non-VHE path.

One of the challenges is that we cannot save/restore the VMCR lazily
because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when
emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end
up being delivered as FIQ.

To solve this problem, and still provide fast performance in the fast
path of exiting a VM when no interrupts are pending (which also
optimized the latency for actually delivering virtual interrupts coming
from physical interrupts), we orchestrate a dance of only doing the
activate/deactivate traps in vgic load/put for VHE systems (which can
have ICC_SRE_EL1.SRE cleared when running in the host), and doing the
configuration on every round-trip on non-VHE systems.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Removed extra blank line

 arch/arm/include/asm/kvm_hyp.h   |   2 +
 arch/arm/kvm/hyp/switch.c|   8 ++-
 arch/arm64/include/asm/kvm_hyp.h |   2 +
 arch/arm64/kvm/hyp/switch.c  |   8 ++-
 virt/kvm/arm/hyp/vgic-v3-sr.c| 120 +--
 virt/kvm/arm/vgic/vgic-v3.c  |   6 ++
 virt/kvm/arm/vgic/vgic.c |   7 +--
 7 files changed, 102 insertions(+), 51 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 530a3c1cfe6f..e93a0cac9add 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -110,6 +110,8 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
+void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
 void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 882b9b9e0077..acf1c37fa49c 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -90,14 +90,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu 
*vcpu)
 
 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
-   if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
+   if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif)) {
__vgic_v3_save_state(vcpu);
+   __vgic_v3_deactivate_traps(vcpu);
+   }
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-   if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
+   if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif)) {
+   __vgic_v3_activate_traps(vcpu);
__vgic_v3_restore_state(vcpu);
+   }
 }
 
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 6f3929b2fcf7..384c34397619 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -124,6 +124,8 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
+void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
 void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 466cfcdbcaf3..7d8a41e3c9ac 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -193,14 +193,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu 
*vcpu)
 
 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
-   if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
+   if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif)) {
__vgic_v3_save_state(vcpu);
+   __vgic_v3_deactivate_traps(vcpu);
+   }
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-   if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
+   if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif)) {
+   __vgic_v3_activate_traps(vcpu);
__vgic_v3_restore_state(vcpu);
+   }
 }
 
 static bool __hyp_text __true_value(void)
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 437d7af08683..b13cbd41dbc3 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -209,15 +209,15 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu 
*vcpu)
 {
struct vgic_v3_cpu_if *cpu_if =

[PATCH v4 39/40] KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load

The APRs can only have bits set when the guest acknowledges an interrupt
in the LR and can only have a bit cleared when the guest EOIs an
interrupt in the LR.  Therefore, if we have no LRs with any
pending/active interrupts, the APR cannot change value and there is no
need to clear it on every exit from the VM (hint: it will have already
been cleared when we exited the guest the last time with the LRs all
EOIed).

The only case we need to take care of is when we migrate the VCPU away
from a CPU or migrate a new VCPU onto a CPU, or when we return to
userspace to capture the state of the VCPU for migration.  To make sure
this works, factor out the APR save/restore functionality into separate
functions called from the VCPU (and by extension VGIC) put/load hooks.

Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_hyp.h   |   2 +
 arch/arm64/include/asm/kvm_hyp.h |   2 +
 virt/kvm/arm/hyp/vgic-v3-sr.c| 124 +--
 virt/kvm/arm/vgic/vgic-v2.c  |   7 +--
 virt/kvm/arm/vgic/vgic-v3.c  |   5 ++
 5 files changed, 78 insertions(+), 62 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 1ab8329e9ff7..530a3c1cfe6f 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -110,6 +110,8 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 
 asmlinkage void __vfp_save_state(struct vfp_hard_struct *vfp);
 asmlinkage void __vfp_restore_state(struct vfp_hard_struct *vfp);
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index febe417b8b4e..6f3929b2fcf7 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -124,6 +124,8 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 9abf2f3c12b5..437d7af08683 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -21,6 +21,7 @@
 
 #include 
 #include 
+#include 
 
 #define vtr_to_max_lr_idx(v)   ((v) & 0xf)
 #define vtr_to_nr_pre_bits(v)  u32)(v) >> 26) & 7) + 1)
@@ -221,14 +222,11 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu 
*vcpu)
 
if (used_lrs) {
int i;
-   u32 nr_pre_bits;
u32 elrsr;
 
elrsr = read_gicreg(ICH_ELSR_EL2);
 
write_gicreg(0, ICH_HCR_EL2);
-   val = read_gicreg(ICH_VTR_EL2);
-   nr_pre_bits = vtr_to_nr_pre_bits(val);
 
for (i = 0; i < used_lrs; i++) {
if (elrsr & (1 << i))
@@ -238,39 +236,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu 
*vcpu)
 
__gic_v3_set_lr(0, i);
}
-
-   switch (nr_pre_bits) {
-   case 7:
-   cpu_if->vgic_ap0r[3] = __vgic_v3_read_ap0rn(3);
-   cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
-   case 6:
-   cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
-   default:
-   cpu_if->vgic_ap0r[0] = __vgic_v3_read_ap0rn(0);
-   }
-
-   switch (nr_pre_bits) {
-   case 7:
-   cpu_if->vgic_ap1r[3] = __vgic_v3_read_ap1rn(3);
-   cpu_if->vgic_ap1r[2] = __vgic_v3_read_ap1rn(2);
-   case 6:
-   cpu_if->vgic_ap1r[1] = __vgic_v3_read_ap1rn(1);
-   default:
-   cpu_if->vgic_ap1r[0] = __vgic_v3_read_ap1rn(0);
-   }
} else {
if (static_branch_unlikely(_v3_cpuif_trap) ||
cpu_if->its_vpe.its_vm)
write_gicreg(0, ICH_HCR_EL2);
-
-   cpu_if->vgic_ap0r[0] = 0;
-   cpu_if->vgic_ap0r[1] = 0;
-   cpu_if->vgic_ap0r[2] = 0;
-   cpu_if->vgic_ap0r[3] = 0;
-   cpu_if->vgic_ap1r[0] = 0;
-   cpu_if->vgic_ap1r[1] = 0;
-   cpu_if->vgic_ap1r[2] = 0;
-   cpu_if->vgic_ap1r[3] = 0;
}
 
val = read_gicreg(ICC_SRE_EL2);
@@ -287,8 +256,6 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu 
*vcpu)
 {
struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-   u64 val;
-

[PATCH v4 38/40] KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on VHE

Just like we can program the GICv2 hypervisor control interface directly
from the core vgic code, we can do the same for the GICv3 hypervisor
control interface on VHE systems.

We do this by simply calling the save/restore functions when we have VHE
and we can then get rid of the save/restore function calls from the VHE
world switch function.

One caveat is that we now write GICv3 system register state before the
potential early exit path in the run loop, and because we sync back
state in the early exit path, we have to ensure that we read a
consistent GIC state from the sync path, even though we have never
actually run the guest with the newly written GIC state.  We solve this
by inserting an ISB in the early exit path.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v2:
 - Added ISB in the early exit path in the run loop as explained
   in the commit message.

 arch/arm64/kvm/hyp/switch.c | 3 ---
 virt/kvm/arm/arm.c  | 1 +
 virt/kvm/arm/vgic/vgic.c| 5 +
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index cbafc27a617b..466cfcdbcaf3 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -399,8 +399,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__activate_traps(vcpu);
__activate_vm(vcpu->kvm);
 
-   __vgic_restore_state(vcpu);
-
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
 
@@ -414,7 +412,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
fp_enabled = fpsimd_enabled_vhe();
 
sysreg_save_guest_state_vhe(guest_ctxt);
-   __vgic_save_state(vcpu);
 
__deactivate_traps(vcpu);
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5bd879c78951..6de7641f3ff2 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -717,6 +717,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
kvm_request_pending(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
+   isb(); /* Ensure work in x_flush_hwstate is committed */
kvm_pmu_sync_hwstate(vcpu);
if (static_branch_unlikely(_irqchip_in_use))
kvm_timer_sync_hwstate(vcpu);
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 12e2a28f437e..d0a19a8c196a 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vgic.h"
 
@@ -753,6 +754,8 @@ static inline void vgic_save_state(struct kvm_vcpu *vcpu)
 {
if (!static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
vgic_v2_save_state(vcpu);
+   else if (has_vhe())
+   __vgic_v3_save_state(vcpu);
 }
 
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
@@ -777,6 +780,8 @@ static inline void vgic_restore_state(struct kvm_vcpu *vcpu)
 {
if (!static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
vgic_v2_restore_state(vcpu);
+   else if (has_vhe())
+   __vgic_v3_restore_state(vcpu);
 }
 
 /* Flush our emulation state into the GIC hardware before entering the guest. 
*/
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 37/40] KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64

The vgic-v2-sr.c file now only contains the logic to replay unaligned
accesses to the virtual CPU interface on 16K and 64K page systems, which
is only relevant on 64-bit platforms.  Therefore move this file to the
arm64 KVM tree, remove the compile directive from the 32-bit side
makefile, and remove the ifdef in the C file.

Reviewed-by: Andre Przywara 
Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/hyp/Makefile | 1 -
 arch/arm64/kvm/hyp/Makefile   | 2 +-
 {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c | 2 --
 3 files changed, 1 insertion(+), 4 deletions(-)
 rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (98%)

diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile
index 5638ce0c9524..1964111c984a 100644
--- a/arch/arm/kvm/hyp/Makefile
+++ b/arch/arm/kvm/hyp/Makefile
@@ -7,7 +7,6 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 KVM=../../../../virt/kvm
 
-obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
 
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index f04400d494b7..7e8d41210288 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -7,10 +7,10 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 KVM=../../../../virt/kvm
 
-obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
 
+obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += entry.o
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
similarity index 98%
rename from virt/kvm/arm/hyp/vgic-v2-sr.c
rename to arch/arm64/kvm/hyp/vgic-v2-sr.c
index 0bbafdfd4adb..97f357ea9c72 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_ARM64
 /*
  * __vgic_v2_perform_cpuif_access -- perform a GICV access on behalf of the
  *  guest.
@@ -77,4 +76,3 @@ int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu 
*vcpu)
 
return 1;
 }
-#endif
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 36/40] KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code

We can program the GICv2 hypervisor control interface logic directly
from the core vgic code and can instead do the save/restore directly
from the flush/sync functions, which can lead to a number of future
optimizations.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v1:
 - Removed unnecessary kvm_hyp.h include
 - Adapted the patch based on having gotten rid of storing the elrsr
   prior to this patch.
 - No longer change the interrupt handling of the maintenance interrupt
   handler.  That seems to have been a leftover from an earlier version
   of the timer patches where we were syncing the vgic state after
   having enabled interrupts, leading to the maintenance interrupt firing.

   It may be possible to move the vgic sync function out to an
   interrupts enabled section later on, which would require
   re-introducing logic to disable the VGIC maintenance interrupt in the
   maintenance interrupt handler, but we leave this for future work as
   the immediate benefit is not clear.

 arch/arm/kvm/hyp/switch.c|  4 ---
 arch/arm64/include/asm/kvm_hyp.h |  2 --
 arch/arm64/kvm/hyp/switch.c  |  4 ---
 virt/kvm/arm/hyp/vgic-v2-sr.c| 65 
 virt/kvm/arm/vgic/vgic-v2.c  | 63 ++
 virt/kvm/arm/vgic/vgic.c | 19 +++-
 virt/kvm/arm/vgic/vgic.h |  3 ++
 7 files changed, 84 insertions(+), 76 deletions(-)

diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index aac025783ee8..882b9b9e0077 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -92,16 +92,12 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu 
*vcpu)
 {
if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
__vgic_v3_save_state(vcpu);
-   else
-   __vgic_v2_save_state(vcpu);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
__vgic_v3_restore_state(vcpu);
-   else
-   __vgic_v2_restore_state(vcpu);
 }
 
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 949f2e77ae58..febe417b8b4e 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -120,8 +120,6 @@ typeof(orig) * __hyp_text fname(void)   
\
return val; \
 }
 
-void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
-void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 0e54fe2aab1c..cbafc27a617b 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -195,16 +195,12 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu 
*vcpu)
 {
if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
__vgic_v3_save_state(vcpu);
-   else
-   __vgic_v2_save_state(vcpu);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
if (static_branch_unlikely(_vgic_global_state.gicv3_cpuif))
__vgic_v3_restore_state(vcpu);
-   else
-   __vgic_v2_restore_state(vcpu);
 }
 
 static bool __hyp_text __true_value(void)
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index a91b0d2b9249..0bbafdfd4adb 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -23,71 +23,6 @@
 #include 
 #include 
 
-static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
-{
-   struct vgic_v2_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v2;
-   u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-   u64 elrsr;
-   int i;
-
-   elrsr = readl_relaxed(base + GICH_ELRSR0);
-   if (unlikely(used_lrs > 32))
-   elrsr |= ((u64)readl_relaxed(base + GICH_ELRSR1)) << 32;
-
-   for (i = 0; i < used_lrs; i++) {
-   if (elrsr & (1UL << i))
-   cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
-   else
-   cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i 
* 4));
-
-   writel_relaxed(0, base + GICH_LR0 + (i * 4));
-   }
-}
-
-/* vcpu is already in the HYP VA space */
-void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
-{
-   struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-   struct vgic_v2_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v2;
-   struct vgic_dist *vgic = >arch.vgic;
-   void __iomem *base = kern_hyp_va(vgic->vctrl_base);
-   u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-
-   if (!base)
-   return;
-

[PATCH v4 35/40] KVM: arm/arm64: Get rid of vgic_elrsr

There is really no need to store the vgic_elrsr on the VGIC data
structures as the only need we have for the elrsr is to figure out if an
LR is inactive when we save the VGIC state upon returning from the
guest.  We can might as well store this in a temporary local variable.

This also gets rid of the endianness conversion in the VGIC save
function, which is completely unnecessary and would actually result in
incorrect functionality on big-endian systems, because we are only using
typed values here and not converting pointers and reading different
types here.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v1:
 - Moved patch up the queue before we start moving code around to avoid 
moving
   potentially broken code.

 include/kvm/arm_vgic.h|  2 --
 virt/kvm/arm/hyp/vgic-v2-sr.c | 28 +++-
 virt/kvm/arm/hyp/vgic-v3-sr.c |  6 +++---
 virt/kvm/arm/vgic/vgic-v2.c   |  1 -
 virt/kvm/arm/vgic/vgic-v3.c   |  1 -
 5 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index cdbd142ca7f2..ac98ae46bfb7 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -263,7 +263,6 @@ struct vgic_dist {
 struct vgic_v2_cpu_if {
u32 vgic_hcr;
u32 vgic_vmcr;
-   u64 vgic_elrsr; /* Saved only */
u32 vgic_apr;
u32 vgic_lr[VGIC_V2_MAX_LRS];
 };
@@ -272,7 +271,6 @@ struct vgic_v3_cpu_if {
u32 vgic_hcr;
u32 vgic_vmcr;
u32 vgic_sre;   /* Restored only, change ignored */
-   u32 vgic_elrsr; /* Saved only */
u32 vgic_ap0r[4];
u32 vgic_ap1r[4];
u64 vgic_lr[VGIC_V3_MAX_LRS];
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index 4fe6e797e8b3..a91b0d2b9249 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -23,29 +23,19 @@
 #include 
 #include 
 
-static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
-{
-   struct vgic_v2_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v2;
-   int nr_lr = (kern_hyp_va(_vgic_global_state))->nr_lr;
-   u32 elrsr0, elrsr1;
-
-   elrsr0 = readl_relaxed(base + GICH_ELRSR0);
-   if (unlikely(nr_lr > 32))
-   elrsr1 = readl_relaxed(base + GICH_ELRSR1);
-   else
-   elrsr1 = 0;
-
-   cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
-}
-
 static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 {
struct vgic_v2_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v2;
-   int i;
u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+   u64 elrsr;
+   int i;
+
+   elrsr = readl_relaxed(base + GICH_ELRSR0);
+   if (unlikely(used_lrs > 32))
+   elrsr |= ((u64)readl_relaxed(base + GICH_ELRSR1)) << 32;
 
for (i = 0; i < used_lrs; i++) {
-   if (cpu_if->vgic_elrsr & (1UL << i))
+   if (elrsr & (1UL << i))
cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
else
cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i 
* 4));
@@ -68,13 +58,9 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
 
if (used_lrs) {
cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
-
-   save_elrsr(vcpu, base);
save_lrs(vcpu, base);
-
writel_relaxed(0, base + GICH_HCR);
} else {
-   cpu_if->vgic_elrsr = ~0UL;
cpu_if->vgic_apr = 0;
}
 }
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index f5c3d6d7019e..9abf2f3c12b5 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -222,15 +222,16 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu 
*vcpu)
if (used_lrs) {
int i;
u32 nr_pre_bits;
+   u32 elrsr;
 
-   cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
+   elrsr = read_gicreg(ICH_ELSR_EL2);
 
write_gicreg(0, ICH_HCR_EL2);
val = read_gicreg(ICH_VTR_EL2);
nr_pre_bits = vtr_to_nr_pre_bits(val);
 
for (i = 0; i < used_lrs; i++) {
-   if (cpu_if->vgic_elrsr & (1 << i))
+   if (elrsr & (1 << i))
cpu_if->vgic_lr[i] &= ~ICH_LR_STATE;
else
cpu_if->vgic_lr[i] = __gic_v3_get_lr(i);
@@ -262,7 +263,6 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
cpu_if->its_vpe.its_vm)
write_gicreg(0, ICH_HCR_EL2);
 
-   cpu_if->vgic_elrsr = 0x;
cpu_if->vgic_ap0r[0] = 0;
cpu_if->vgic_ap0r[1] = 0;

[PATCH v4 34/40] KVM: arm64: Cleanup __activate_traps and __deactive_traps for VHE and non-VHE

To make the code more readable and to avoid the overhead of a function
call, let's get rid of a pair of the alternative function selectors and
explicitly call the VHE and non-VHE functions using the has_vhe() static
key based selector instead, telling the compiler to try to inline the
static function if it can.

Signed-off-by: Christoffer Dall 
---
 arch/arm64/kvm/hyp/switch.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 5e94955b89ea..0e54fe2aab1c 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -85,7 +85,7 @@ static void __hyp_text __deactivate_traps_common(void)
write_sysreg(0, pmuserenr_el0);
 }
 
-static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+static inline void activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
u64 val;
 
@@ -97,7 +97,7 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu 
*vcpu)
write_sysreg(kvm_get_hyp_vector(), vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
+static inline void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
u64 val;
 
@@ -108,11 +108,7 @@ static void __hyp_text __activate_traps_nvhe(struct 
kvm_vcpu *vcpu)
write_sysreg(val, cptr_el2);
 }
 
-static hyp_alternate_select(__activate_traps_arch,
-   __activate_traps_nvhe, __activate_traps_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
-static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
+static inline void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
u64 hcr = vcpu->arch.hcr_el2;
 
@@ -122,10 +118,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
*vcpu)
write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
 
__activate_traps_fpsimd32(vcpu);
-   __activate_traps_arch()(vcpu);
+   if (has_vhe())
+   activate_traps_vhe(vcpu);
+   else
+   __activate_traps_nvhe(vcpu);
 }
 
-static void __hyp_text __deactivate_traps_vhe(void)
+static inline void deactivate_traps_vhe(void)
 {
extern char vectors[];  /* kernel exception vectors */
write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
@@ -133,7 +132,7 @@ static void __hyp_text __deactivate_traps_vhe(void)
write_sysreg(vectors, vbar_el1);
 }
 
-static void __hyp_text __deactivate_traps_nvhe(void)
+static inline void __hyp_text __deactivate_traps_nvhe(void)
 {
u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
@@ -147,11 +146,7 @@ static void __hyp_text __deactivate_traps_nvhe(void)
write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
 }
 
-static hyp_alternate_select(__deactivate_traps_arch,
-   __deactivate_traps_nvhe, __deactivate_traps_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
-static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
+static inline void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 {
/*
 * If we pended a virtual abort, preserve it until it gets
@@ -162,7 +157,10 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu 
*vcpu)
if (vcpu->arch.hcr_el2 & HCR_VSE)
vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
-   __deactivate_traps_arch()();
+   if (has_vhe())
+   deactivate_traps_vhe();
+   else
+   __deactivate_traps_nvhe();
 }
 
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 33/40] KVM: arm64: Configure c15, PMU, and debug register traps on cpu load/put for VHE

We do not have to change the c15 trap setting on each switch to/from the
guest on VHE systems, because this setting only affects EL0.

The PMU and debug trap configuration can also be done on vcpu load/put
instead, because they don't affect how the host kernel can access the
debug registers while executing KVM kernel code.

Signed-off-by: Christoffer Dall 
---
 arch/arm64/include/asm/kvm_hyp.h |  3 +++
 arch/arm64/kvm/hyp/switch.c  | 31 ++-
 arch/arm64/kvm/hyp/sysreg-sr.c   |  4 
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 2b1fda90dde4..949f2e77ae58 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -147,6 +147,9 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
 bool __fpsimd_enabled(void);
 
+void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
+void deactivate_traps_vhe_put(void);
+
 u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
 void __noreturn __hyp_do_panic(unsigned long, ...);
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 9c40e203bd09..5e94955b89ea 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -101,6 +101,8 @@ static void __hyp_text __activate_traps_nvhe(struct 
kvm_vcpu *vcpu)
 {
u64 val;
 
+   __activate_traps_common(vcpu);
+
val = CPTR_EL2_DEFAULT;
val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
write_sysreg(val, cptr_el2);
@@ -120,20 +122,12 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
*vcpu)
write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
 
__activate_traps_fpsimd32(vcpu);
-   __activate_traps_common(vcpu);
__activate_traps_arch()(vcpu);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
 {
extern char vectors[];  /* kernel exception vectors */
-   u64 mdcr_el2 = read_sysreg(mdcr_el2);
-
-   mdcr_el2 &= MDCR_EL2_HPMN_MASK |
-   MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
-   MDCR_EL2_TPMS;
-
-   write_sysreg(mdcr_el2, mdcr_el2);
write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
write_sysreg(vectors, vbar_el1);
@@ -143,6 +137,8 @@ static void __hyp_text __deactivate_traps_nvhe(void)
 {
u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
+   __deactivate_traps_common();
+
mdcr_el2 &= MDCR_EL2_HPMN_MASK;
mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
 
@@ -166,10 +162,27 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu 
*vcpu)
if (vcpu->arch.hcr_el2 & HCR_VSE)
vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
-   __deactivate_traps_common();
__deactivate_traps_arch()();
 }
 
+void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
+{
+   __activate_traps_common(vcpu);
+}
+
+void deactivate_traps_vhe_put(void)
+{
+   u64 mdcr_el2 = read_sysreg(mdcr_el2);
+
+   mdcr_el2 &= MDCR_EL2_HPMN_MASK |
+   MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
+   MDCR_EL2_TPMS;
+
+   write_sysreg(mdcr_el2, mdcr_el2);
+
+   __deactivate_traps_common();
+}
+
 static void __hyp_text __activate_vm(struct kvm *kvm)
 {
write_sysreg(kvm->arch.vttbr, vttbr_el2);
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index aacba4636871..b3894df6bf1a 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -254,6 +254,8 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
__sysreg_restore_el1_state(guest_ctxt);
 
vcpu->arch.sysregs_loaded_on_cpu = true;
+
+   activate_traps_vhe_load(vcpu);
 }
 
 /**
@@ -275,6 +277,8 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
if (!has_vhe())
return;
 
+   deactivate_traps_vhe_put();
+
__sysreg_save_el1_state(guest_ctxt);
__sysreg_save_user_state(guest_ctxt);
__sysreg32_save_state(vcpu);
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 32/40] KVM: arm64: Directly call VHE and non-VHE FPSIMD enabled functions

There is no longer a need for an alternative to choose the right
function to tell us whether or not FPSIMD was enabled for the VM,
because we can simply cann the appropriate functions directly fromwithin
the _vhe and _nvhe run functions.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - New patch since we no longer defer FPSIMD handling to load/put

 arch/arm64/kvm/hyp/switch.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 17e3c6f26a34..9c40e203bd09 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -33,20 +33,11 @@ static bool __hyp_text __fpsimd_enabled_nvhe(void)
return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
 }
 
-static bool __hyp_text __fpsimd_enabled_vhe(void)
+static bool fpsimd_enabled_vhe(void)
 {
return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
 }
 
-static hyp_alternate_select(__fpsimd_is_enabled,
-   __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
-bool __hyp_text __fpsimd_enabled(void)
-{
-   return __fpsimd_is_enabled()();
-}
-
 /* Save the 32-bit only FPSIMD system register state */
 static inline void __hyp_text __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
 {
@@ -413,7 +404,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
/* And we're baaack! */
} while (fixup_guest_exit(vcpu, _code));
 
-   fp_enabled = __fpsimd_enabled();
+   fp_enabled = fpsimd_enabled_vhe();
 
sysreg_save_guest_state_vhe(guest_ctxt);
__vgic_save_state(vcpu);
@@ -478,7 +469,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
__qcom_hyp_sanitize_btac_predictors();
}
 
-   fp_enabled = __fpsimd_enabled();
+   fp_enabled = __fpsimd_enabled_nvhe();
 
__sysreg_save_state_nvhe(guest_ctxt);
__sysreg32_save_state(vcpu);
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 31/40] KVM: arm64: Move common VHE/non-VHE trap config in separate functions

As we are about to be more lazy with some of the trap configuration
register read/writes for VHE systems, move the logic that is currently
shared between VHE and non-VHE into a separate function which can be
called from either the world-switch path or from vcpu_load/vcpu_put.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Separate fpsimd32 trap configuration into a separate function
   which is still called from __activate_traps, because we no longer
   defer saving/restoring of VFP registers to load/put.

 arch/arm64/kvm/hyp/switch.c | 76 +++--
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 909aa3fe9196..17e3c6f26a34 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -56,7 +56,45 @@ static inline void __hyp_text __fpsimd_save_fpexc32(struct 
kvm_vcpu *vcpu)
vcpu->arch.ctxt.sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
 }
 
-static void __hyp_text __activate_traps_vhe(void)
+static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
+{
+   /*
+* We are about to set CPTR_EL2.TFP to trap all floating point
+* register accesses to EL2, however, the ARM ARM clearly states that
+* traps are only taken to EL2 if the operation would not otherwise
+* trap to EL1.  Therefore, always make sure that for 32-bit guests,
+* we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
+* If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
+* it will cause an exception.
+*/
+   if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
+   write_sysreg(1 << 30, fpexc32_el2);
+   isb();
+   }
+}
+
+static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
+{
+   /* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
+   write_sysreg(1 << 15, hstr_el2);
+   /*
+* Make sure we trap PMU access from EL0 to EL2. Also sanitize
+* PMSELR_EL0 to make sure it never contains the cycle
+* counter, which could make a PMXEVCNTR_EL0 access UNDEF at
+* EL1 instead of being trapped to EL2.
+*/
+   write_sysreg(0, pmselr_el0);
+   write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
+   write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+}
+
+static void __hyp_text __deactivate_traps_common(void)
+{
+   write_sysreg(0, hstr_el2);
+   write_sysreg(0, pmuserenr_el0);
+}
+
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
u64 val;
 
@@ -68,7 +106,7 @@ static void __hyp_text __activate_traps_vhe(void)
write_sysreg(kvm_get_hyp_vector(), vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
u64 val;
 
@@ -85,37 +123,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
*vcpu)
 {
u64 hcr = vcpu->arch.hcr_el2;
 
-   /*
-* We are about to set CPTR_EL2.TFP to trap all floating point
-* register accesses to EL2, however, the ARM ARM clearly states that
-* traps are only taken to EL2 if the operation would not otherwise
-* trap to EL1.  Therefore, always make sure that for 32-bit guests,
-* we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
-* If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
-* it will cause an exception.
-*/
-   if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
-   write_sysreg(1 << 30, fpexc32_el2);
-   isb();
-   }
+   write_sysreg(hcr, hcr_el2);
 
if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
 
-   write_sysreg(hcr, hcr_el2);
-
-   /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
-   write_sysreg(1 << 15, hstr_el2);
-   /*
-* Make sure we trap PMU access from EL0 to EL2. Also sanitize
-* PMSELR_EL0 to make sure it never contains the cycle
-* counter, which could make a PMXEVCNTR_EL0 access UNDEF at
-* EL1 instead of being trapped to EL2.
-*/
-   write_sysreg(0, pmselr_el0);
-   write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
-   write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-   __activate_traps_arch()();
+   __activate_traps_fpsimd32(vcpu);
+   __activate_traps_common(vcpu);
+   __activate_traps_arch()(vcpu);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
@@ -160,9 +175,8 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu 
*vcpu)
if (vcpu->arch.hcr_el2 & HCR_VSE)
vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
+   __deactivate_traps_common();

[PATCH v4 30/40] KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put

When running a 32-bit VM (EL1 in AArch32), the AArch32 system registers
can be deferred to vcpu load/put on VHE systems because neither
the host kernel nor host userspace uses these registers.

Note that we can not defer saving DBGVCR32_EL2 conditionally based
on the state of the debug dirty flag on VHE, but since we do the
load/put pretty rarely, this comes out as a win anyway.

We can also not defer saving FPEXC32_32 because this register only holds
a guest-valid value for 32-bit guests during the exit path when the
guest has used FPSIMD registers and restored the register in the early
assembly handler from taking the EL2 fault, and therefore we have to
check if fpsimd is enabled for the guest in the exit path and save the
register then, for both VHE and non-VHE guests.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Rework the FPEXC32 save/restore logic to no longer attempt to
   save/restore this register lazily.

Changes since v2:
 - New patch (deferred register handling has been reworked)

 arch/arm64/kvm/hyp/switch.c| 17 +++--
 arch/arm64/kvm/hyp/sysreg-sr.c | 15 ++-
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 22e77deb8e2e..909aa3fe9196 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -47,6 +47,15 @@ bool __hyp_text __fpsimd_enabled(void)
return __fpsimd_is_enabled()();
 }
 
+/* Save the 32-bit only FPSIMD system register state */
+static inline void __hyp_text __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
+{
+   if (!vcpu_el1_is_32bit(vcpu))
+   return;
+
+   vcpu->arch.ctxt.sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
+}
+
 static void __hyp_text __activate_traps_vhe(void)
 {
u64 val;
@@ -380,11 +389,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
__vgic_restore_state(vcpu);
 
-   /*
-* We must restore the 32-bit state before the sysregs, thanks
-* to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
-*/
-   __sysreg32_restore_state(vcpu);
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
 
@@ -398,7 +402,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
fp_enabled = __fpsimd_enabled();
 
sysreg_save_guest_state_vhe(guest_ctxt);
-   __sysreg32_save_state(vcpu);
__vgic_save_state(vcpu);
 
__deactivate_traps(vcpu);
@@ -408,6 +411,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
if (fp_enabled) {
__fpsimd_save_state(_ctxt->gp_regs.fp_regs);
__fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
+   __fpsimd_save_fpexc32(vcpu);
}
 
__debug_switch_to_host(vcpu);
@@ -475,6 +479,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
if (fp_enabled) {
__fpsimd_save_state(_ctxt->gp_regs.fp_regs);
__fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
+   __fpsimd_save_fpexc32(vcpu);
}
 
/*
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 9c60b8062724..aacba4636871 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -196,10 +196,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu 
*vcpu)
sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
 
-   if (__fpsimd_enabled())
-   sysreg[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
-
-   if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+   if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
 }
 
@@ -221,7 +218,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu 
*vcpu)
write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
 
-   if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+   if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
 
@@ -246,6 +243,13 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 
__sysreg_save_user_state(host_ctxt);
 
+   /*
+* Load guest EL1 and user state
+*
+* We must restore the 32-bit state before the sysregs, thanks
+* to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
+*/
+   __sysreg32_restore_state(vcpu);
__sysreg_restore_user_state(guest_ctxt);
__sysreg_restore_el1_state(guest_ctxt);
 
@@ -273,6 +277,7 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 
__sysreg_save_el1_state(guest_ctxt);
__sysreg_save_user_state(guest_ctxt);
+   __sysreg32_save_state(vcpu);
 
/* Restore host user state */
__sysreg_restore_user_state(host_ctxt);
--

[PATCH v4 29/40] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers

32-bit registers are not used by a 64-bit host kernel and can be
deferred, but we need to rework the accesses to this register to access
the latest value depending on whether or not guest system registers are
loaded on the CPU or only reside in memory.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Don't also try to write hardware spsr when sysregs are not loaded
 - Adapted patch to use switch-based sysreg save/restore approach
 - (Kept additional BUG_ON() in vcpu_read_spsr32() to keep the compiler 
happy)

Changes since v2:
 - New patch (deferred register handling has been reworked)

 arch/arm64/include/asm/kvm_emulate.h | 32 +
 arch/arm64/kvm/regmap.c  | 67 +++-
 arch/arm64/kvm/sys_regs.c|  6 
 3 files changed, 65 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 9cb13b23c7a1..23b33e8ea03a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,7 +33,8 @@
 #include 
 
 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
-unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
+unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
+void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
 
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
@@ -162,41 +163,26 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 
reg_num,
 
 static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 {
-   unsigned long *p = (unsigned long 
*)_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
-
-   if (vcpu_mode_is_32bit(vcpu)) {
-   unsigned long *p_32bit = vcpu_spsr32(vcpu);
-
-   /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
-   if (p_32bit != (unsigned long *)p)
-   return *p_32bit;
-   }
+   if (vcpu_mode_is_32bit(vcpu))
+   return vcpu_read_spsr32(vcpu);
 
if (vcpu->arch.sysregs_loaded_on_cpu)
return read_sysreg_el1(spsr);
else
-   return *p;
+   return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
 }
 
-static inline void vcpu_write_spsr(const struct kvm_vcpu *vcpu, unsigned long 
v)
+static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
 {
-   unsigned long *p = (unsigned long 
*)_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
-
-   /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
if (vcpu_mode_is_32bit(vcpu)) {
-   unsigned long *p_32bit = vcpu_spsr32(vcpu);
-
-   /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
-   if (p_32bit != (unsigned long *)p) {
-   *p_32bit = v;
-   return;
-   }
+   vcpu_write_spsr32(vcpu, v);
+   return;
}
 
if (vcpu->arch.sysregs_loaded_on_cpu)
write_sysreg_el1(v, spsr);
else
-   *p = v;
+   vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
 }
 
 static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index bbc6ae32e4af..eefe403a2e63 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -141,28 +141,61 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 
reg_num)
 /*
  * Return the SPSR for the current mode of the virtual CPU.
  */
-unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu)
+static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
 {
unsigned long mode = *vcpu_cpsr(vcpu) & COMPAT_PSR_MODE_MASK;
switch (mode) {
-   case COMPAT_PSR_MODE_SVC:
-   mode = KVM_SPSR_SVC;
-   break;
-   case COMPAT_PSR_MODE_ABT:
-   mode = KVM_SPSR_ABT;
-   break;
-   case COMPAT_PSR_MODE_UND:
-   mode = KVM_SPSR_UND;
-   break;
-   case COMPAT_PSR_MODE_IRQ:
-   mode = KVM_SPSR_IRQ;
-   break;
-   case COMPAT_PSR_MODE_FIQ:
-   mode = KVM_SPSR_FIQ;
-   break;
+   case COMPAT_PSR_MODE_SVC: return KVM_SPSR_SVC;
+   case COMPAT_PSR_MODE_ABT: return KVM_SPSR_ABT;
+   case COMPAT_PSR_MODE_UND: return KVM_SPSR_UND;
+   case COMPAT_PSR_MODE_IRQ: return KVM_SPSR_IRQ;
+   case COMPAT_PSR_MODE_FIQ: return KVM_SPSR_FIQ;
+   default: BUG();
+   }
+}
+
+unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
+{
+   int spsr_idx = vcpu_spsr32_mode(vcpu);
+
+   if (!vcpu->arch.sysregs_loaded_on_cpu)
+   return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
+
+   switch (spsr_idx) {
+   case KVM_SPSR_SVC:
+   return read_sysreg_el1(spsr);
+   case KVM_SPSR_ABT:
+   return read_sysreg(spsr_abt);
+   case

[PATCH v4 28/40] KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on VHE

Some system registers do not affect the host kernel's execution and can
therefore be loaded when we are about to run a VCPU and we don't have to
restore the host state to the hardware before the time when we are
actually about to return to userspace or schedule out the VCPU thread.

The EL1 system registers and the userspace state registers only
affecting EL0 execution do not need to be saved and restored on every
switch between the VM and the host, because they don't affect the host
kernel's execution.

We mark all registers which are now deffered as such in the
vcpu_{read,write}_sys_reg accessors in sys-regs.c to ensure the most
up-to-date copy is always accessed.

Note MPIDR_EL1 (controlled via VMPIDR_EL2) is accessed from other vcpu
threads, for example via the GIC emulation, and therefore must be
declared as immediate, which is fine as the guest cannot modify this
value.

The 32-bit sysregs can also be deferred but we do this in a separate
patch as it requires a bit more infrastructure.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Changed to switch-based sysreg approach

 arch/arm64/kvm/hyp/sysreg-sr.c | 39 +++
 arch/arm64/kvm/sys_regs.c  | 40 
 2 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 906606dc4e2c..9c60b8062724 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -25,8 +25,12 @@
 /*
  * Non-VHE: Both host and guest must save everything.
  *
- * VHE: Host must save tpidr*_el0, mdscr_el1, sp_el0,
- * and guest must save everything.
+ * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and pstate,
+ * which are handled as part of the el2 return state) on every switch.
+ * tpidr_el0 and tpidrro_el0 only need to be switched when going
+ * to host userspace or a different VCPU.  EL1 registers only need to be
+ * switched when potentially going to run a different VCPU.  The latter two
+ * classes are handled as part of kvm_arch_vcpu_load and kvm_arch_vcpu_put.
  */
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
@@ -93,14 +97,11 @@ void __hyp_text __sysreg_save_state_nvhe(struct 
kvm_cpu_context *ctxt)
 void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
__sysreg_save_common_state(ctxt);
-   __sysreg_save_user_state(ctxt);
 }
 
 void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
-   __sysreg_save_el1_state(ctxt);
__sysreg_save_common_state(ctxt);
-   __sysreg_save_user_state(ctxt);
__sysreg_save_el2_return_state(ctxt);
 }
 
@@ -169,14 +170,11 @@ void __hyp_text __sysreg_restore_state_nvhe(struct 
kvm_cpu_context *ctxt)
 void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
__sysreg_restore_common_state(ctxt);
-   __sysreg_restore_user_state(ctxt);
 }
 
 void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
-   __sysreg_restore_el1_state(ctxt);
__sysreg_restore_common_state(ctxt);
-   __sysreg_restore_user_state(ctxt);
__sysreg_restore_el2_return_state(ctxt);
 }
 
@@ -240,6 +238,18 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu 
*vcpu)
  */
 void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 {
+   struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
+   struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
+
+   if (!has_vhe())
+   return;
+
+   __sysreg_save_user_state(host_ctxt);
+
+   __sysreg_restore_user_state(guest_ctxt);
+   __sysreg_restore_el1_state(guest_ctxt);
+
+   vcpu->arch.sysregs_loaded_on_cpu = true;
 }
 
 /**
@@ -255,6 +265,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
  */
 void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 {
+   struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
+   struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
+
+   if (!has_vhe())
+   return;
+
+   __sysreg_save_el1_state(guest_ctxt);
+   __sysreg_save_user_state(guest_ctxt);
+
+   /* Restore host user state */
+   __sysreg_restore_user_state(host_ctxt);
+
+   vcpu->arch.sysregs_loaded_on_cpu = false;
 }
 
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b3c3f014aa61..f060309337aa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -87,6 +87,26 @@ u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
 * exit from the guest but are only saved on vcpu_put.
 */
switch (reg) {
+   case CSSELR_EL1:return read_sysreg_s(SYS_CSSELR_EL1);
+   case SCTLR_EL1: return read_sysreg_s(sctlr_EL12);
+   case ACTLR_EL1: return read_sysreg_s(SYS_ACTLR_EL1);
+   case CPACR_EL1: return read_sysreg_s(cpacr_EL12);
+

[PATCH v4 26/40] KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1

SPSR_EL1 is not used by a VHE host kernel and can be deferred, but we
need to rework the accesses to this register to access the latest value
depending on whether or not guest system registers are loaded on the CPU
or only reside in memory.

The handling of accessing the various banked SPSRs for 32-bit VMs is a
bit clunky, but this will be improved in following patches which will
first prepare and subsequently implement deferred save/restore of the
32-bit registers, including the 32-bit SPSRs.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v2:
 - New patch (deferred register handling has been reworked)

 arch/arm/include/asm/kvm_emulate.h   | 12 ++-
 arch/arm/kvm/emulate.c   |  2 +-
 arch/arm64/include/asm/kvm_emulate.h | 41 +++-
 arch/arm64/kvm/inject_fault.c|  4 ++--
 virt/kvm/arm/aarch32.c   |  2 +-
 5 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index e27caa4b47a1..6493bd479ddc 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -41,7 +41,17 @@ static inline unsigned long *vcpu_reg32(struct kvm_vcpu 
*vcpu, u8 reg_num)
return vcpu_reg(vcpu, reg_num);
 }
 
-unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
+unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu);
+
+static inline unsigned long vpcu_read_spsr(struct kvm_vcpu *vcpu)
+{
+   return *__vcpu_spsr(vcpu);
+}
+
+static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
+{
+   *__vcpu_spsr(vcpu) = v;
+}
 
 static inline unsigned long vcpu_get_reg(struct kvm_vcpu *vcpu,
 u8 reg_num)
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index fa501bf437f3..9046b53d87c1 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -142,7 +142,7 @@ unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num)
 /*
  * Return the SPSR for the current mode of the virtual CPU.
  */
-unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
+unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu)
 {
unsigned long mode = *vcpu_cpsr(vcpu) & MODE_MASK;
switch (mode) {
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index d313aaae5c38..47c2406755fa 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -143,13 +144,43 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 
reg_num,
vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
 }
 
-/* Get vcpu SPSR for current mode */
-static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu)
+static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 {
-   if (vcpu_mode_is_32bit(vcpu))
-   return vcpu_spsr32(vcpu);
+   unsigned long *p = (unsigned long 
*)_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+
+   if (vcpu_mode_is_32bit(vcpu)) {
+   unsigned long *p_32bit = vcpu_spsr32(vcpu);
+
+   /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+   if (p_32bit != (unsigned long *)p)
+   return *p_32bit;
+   }
+
+   if (vcpu->arch.sysregs_loaded_on_cpu)
+   return read_sysreg_el1(spsr);
+   else
+   return *p;
+}
 
-   return (unsigned long *)_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+static inline void vcpu_write_spsr(const struct kvm_vcpu *vcpu, unsigned long 
v)
+{
+   unsigned long *p = (unsigned long 
*)_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+
+   /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+   if (vcpu_mode_is_32bit(vcpu)) {
+   unsigned long *p_32bit = vcpu_spsr32(vcpu);
+
+   /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+   if (p_32bit != (unsigned long *)p) {
+   *p_32bit = v;
+   return;
+   }
+   }
+
+   if (vcpu->arch.sysregs_loaded_on_cpu)
+   write_sysreg_el1(v, spsr);
+   else
+   *p = v;
 }
 
 static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index e08db2f2dd75..8dda1edae727 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -71,7 +71,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, 
unsigned long addr
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-   *vcpu_spsr(vcpu) = cpsr;
+   vcpu_write_spsr(vcpu, cpsr);
 
vcpu_write_sys_reg(vcpu, FAR_EL1, addr);
 
@@ -106,7 +106,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-

[PATCH v4 27/40] KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1

ELR_EL1 is not used by a VHE host kernel and can be deferred, but we
need to rework the accesses to this register to access the latest value
depending on whether or not guest system registers are loaded on the CPU
or only reside in memory.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v2:
 - New patch (deferred register handling has been reworked)

 arch/arm64/include/asm/kvm_emulate.h | 18 +-
 arch/arm64/kvm/inject_fault.c|  4 ++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 47c2406755fa..9cb13b23c7a1 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -90,11 +90,27 @@ static inline unsigned long *vcpu_pc(const struct kvm_vcpu 
*vcpu)
return (unsigned long *)_gp_regs(vcpu)->regs.pc;
 }
 
-static inline unsigned long *vcpu_elr_el1(const struct kvm_vcpu *vcpu)
+static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
 {
return (unsigned long *)_gp_regs(vcpu)->elr_el1;
 }
 
+static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
+{
+   if (vcpu->arch.sysregs_loaded_on_cpu)
+   return read_sysreg_el1(elr);
+   else
+   return *__vcpu_elr_el1(vcpu);
+}
+
+static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned 
long v)
+{
+   if (vcpu->arch.sysregs_loaded_on_cpu)
+   write_sysreg_el1(v, elr);
+   else
+   *__vcpu_elr_el1(vcpu) = v;
+}
+
 static inline unsigned long *vcpu_cpsr(const struct kvm_vcpu *vcpu)
 {
return (unsigned long *)_gp_regs(vcpu)->regs.pstate;
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 8dda1edae727..cc13b6f5ad11 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -67,7 +67,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, 
unsigned long addr
bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
u32 esr = 0;
 
-   *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
+   vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
@@ -102,7 +102,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
unsigned long cpsr = *vcpu_cpsr(vcpu);
u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-   *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
+   vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 23/40] KVM: arm64: Change 32-bit handling of VM system registers

We currently handle 32-bit accesses to trapped VM system registers using
the 32-bit index into the coproc array on the vcpu structure, which is a
union of the coproc array and the sysreg array.

Since all the 32-bit coproc indices are created to correspond to the
architectural mapping between 64-bit system registers and 32-bit
coprocessor registers, and because the AArch64 system registers are the
double in size of the AArch32 coprocessor registers, we can always find
the system register entry that we must update by dividing the 32-bit
coproc index by 2.

This is going to make our lives much easier when we have to start
accessing system registers that use deferred save/restore and might
have to be read directly from the physical CPU.

Reviewed-by: Andrew Jones 
Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/include/asm/kvm_host.h |  8 
 arch/arm64/kvm/sys_regs.c | 20 +++-
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index c30fc96992df..f2a6f39aec87 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -295,14 +295,6 @@ struct kvm_vcpu_arch {
 #define vcpu_cp14(v,r) ((v)->arch.ctxt.copro[(r)])
 #define vcpu_cp15(v,r) ((v)->arch.ctxt.copro[(r)])
 
-#ifdef CONFIG_CPU_BIG_ENDIAN
-#define vcpu_cp15_64_high(v,r) vcpu_cp15((v),(r))
-#define vcpu_cp15_64_low(v,r)  vcpu_cp15((v),(r) + 1)
-#else
-#define vcpu_cp15_64_high(v,r) vcpu_cp15((v),(r) + 1)
-#define vcpu_cp15_64_low(v,r)  vcpu_cp15((v),(r))
-#endif
-
 struct kvm_vm_stat {
ulong remote_tlb_flush;
 };
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 50a43c7b97ca..b48af790615e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -121,16 +121,26 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
  const struct sys_reg_desc *r)
 {
bool was_enabled = vcpu_has_cache_enabled(vcpu);
+   u64 val;
+   int reg = r->reg;
 
BUG_ON(!p->is_write);
 
-   if (!p->is_aarch32) {
-   vcpu_sys_reg(vcpu, r->reg) = p->regval;
+   /* See the 32bit mapping in kvm_host.h */
+   if (p->is_aarch32)
+   reg = r->reg / 2;
+
+   if (!p->is_aarch32 || !p->is_32bit) {
+   val = p->regval;
} else {
-   if (!p->is_32bit)
-   vcpu_cp15_64_high(vcpu, r->reg) = 
upper_32_bits(p->regval);
-   vcpu_cp15_64_low(vcpu, r->reg) = lower_32_bits(p->regval);
+   val = vcpu_sys_reg(vcpu, reg);
+   if (r->reg % 2)
+   val = (p->regval << 32) | (u64)lower_32_bits(val);
+   else
+   val = ((u64)upper_32_bits(val) << 32) |
+   lower_32_bits(p->regval);
}
+   vcpu_sys_reg(vcpu, reg) = val;
 
kvm_toggle_cache(vcpu, was_enabled);
return true;
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 25/40] KVM: arm64: Introduce framework for accessing deferred sysregs

We are about to defer saving and restoring some groups of system
registers to vcpu_put and vcpu_load on supported systems.  This means
that we need some infrastructure to access system registes which
supports either accessing the memory backing of the register or directly
accessing the system registers, depending on the state of the system
when we access the register.

We do this by defining read/write accessor functions, which can handle
both "immediate" and "deferrable" system registers.  Immediate registers
are always saved/restored in the world-switch path, but deferrable
registers are only saved/restored in vcpu_put/vcpu_load when supported
and sysregs_loaded_on_cpu will be set in that case.

Note that we don't use the deferred mechanism yet in this patch, but only
introduce infrastructure.  This is to improve convenience of review in
the subsequent patches where it is clear which registers become
deferred.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Changed to a switch-statement based approach to improve
   readability.

Changes since v2:
 - New patch (deferred register handling has been reworked)

 arch/arm64/include/asm/kvm_host.h |  8 ++--
 arch/arm64/kvm/sys_regs.c | 33 +
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 68398bf7882f..b463b5e28959 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -284,6 +284,10 @@ struct kvm_vcpu_arch {
 
/* Virtual SError ESR to restore when HCR_EL2.VSE is set */
u64 vsesr_el2;
+
+   /* True when deferrable sysregs are loaded on the physical CPU,
+* see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
+   bool sysregs_loaded_on_cpu;
 };
 
 #define vcpu_gp_regs(v)(&(v)->arch.ctxt.gp_regs)
@@ -296,8 +300,8 @@ struct kvm_vcpu_arch {
  */
 #define __vcpu_sys_reg(v,r)((v)->arch.ctxt.sys_regs[(r)])
 
-#define vcpu_read_sys_reg(v,r) __vcpu_sys_reg(v,r)
-#define vcpu_write_sys_reg(v,r,n)  do { __vcpu_sys_reg(v,r) = n; } while 
(0)
+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
 
 /*
  * CP14 and CP15 live in the same array, as they are backed by the
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a05d2c01c786..b3c3f014aa61 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,6 +77,38 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
return false;
 }
 
+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
+{
+   if (!vcpu->arch.sysregs_loaded_on_cpu)
+   goto immediate_read;
+
+   /*
+* All system registers listed in the switch are not saved on every
+* exit from the guest but are only saved on vcpu_put.
+*/
+   switch (reg) {
+   }
+
+immediate_read:
+   return __vcpu_sys_reg(vcpu, reg);
+}
+
+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+   if (!vcpu->arch.sysregs_loaded_on_cpu)
+   goto immediate_write;
+
+   /*
+* All system registers listed in the switch are not restored on every
+* entry to the guest but are only restored on vcpu_load.
+*/
+   switch (reg) {
+   }
+
+immediate_write:
+__vcpu_sys_reg(vcpu, reg) = val;
+}
+
 /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
 static u32 cache_levels;
 
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 24/40] KVM: arm64: Rewrite system register accessors to read/write functions

From: Christoffer Dall 

Currently we access the system registers array via the vcpu_sys_reg()
macro.  However, we are about to change the behavior to some times
modify the register file directly, so let's change this to two
primitives:

 * Accessor macros vcpu_write_sys_reg() and vcpu_read_sys_reg()
 * Direct array access macro __vcpu_sys_reg()

The first primitive should be used in places where the code needs to
access the currently loaded VCPU's state as observed by the guest.  For
example, when trapping on cache related registers, a write to a system
register should go directly to the VCPU version of the register.

The second primitive can be used in places where the VCPU is known to
never be running (for example userspace access) or for registers which
are never context switched (for example all the PMU system registers).

This rewrites all users of vcpu_sys_regs to one of the two primitives
above.

No functional change.

Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v2:
 - New patch (deferred register handling has been reworked)

 arch/arm64/include/asm/kvm_emulate.h | 13 ---
 arch/arm64/include/asm/kvm_host.h| 13 ++-
 arch/arm64/include/asm/kvm_mmu.h |  2 +-
 arch/arm64/kvm/debug.c   | 27 +-
 arch/arm64/kvm/inject_fault.c|  8 ++--
 arch/arm64/kvm/sys_regs.c| 71 ++--
 arch/arm64/kvm/sys_regs.h|  4 +-
 arch/arm64/kvm/sys_regs_generic_v8.c |  4 +-
 virt/kvm/arm/pmu.c   | 37 ++-
 9 files changed, 102 insertions(+), 77 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 3cc535591bdf..d313aaae5c38 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -290,15 +290,18 @@ static inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu 
*vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-   return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+   return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
 {
-   if (vcpu_mode_is_32bit(vcpu))
+   if (vcpu_mode_is_32bit(vcpu)) {
*vcpu_cpsr(vcpu) |= COMPAT_PSR_E_BIT;
-   else
-   vcpu_sys_reg(vcpu, SCTLR_EL1) |= (1 << 25);
+   } else {
+   u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+   sctlr |= (1 << 25);
+   vcpu_write_sys_reg(vcpu, SCTLR_EL1, sctlr);
+   }
 }
 
 static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
@@ -306,7 +309,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
if (vcpu_mode_is_32bit(vcpu))
return !!(*vcpu_cpsr(vcpu) & COMPAT_PSR_E_BIT);
 
-   return !!(vcpu_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
+   return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
 }
 
 static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index f2a6f39aec87..68398bf7882f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -287,7 +287,18 @@ struct kvm_vcpu_arch {
 };
 
 #define vcpu_gp_regs(v)(&(v)->arch.ctxt.gp_regs)
-#define vcpu_sys_reg(v,r)  ((v)->arch.ctxt.sys_regs[(r)])
+
+/*
+ * Only use __vcpu_sys_reg if you know you want the memory backed version of a
+ * register, and not the one most recently accessed by a runnning VCPU.  For
+ * example, for userpace access or for system registers that are never context
+ * switched, but only emulated.
+ */
+#define __vcpu_sys_reg(v,r)((v)->arch.ctxt.sys_regs[(r)])
+
+#define vcpu_read_sys_reg(v,r) __vcpu_sys_reg(v,r)
+#define vcpu_write_sys_reg(v,r,n)  do { __vcpu_sys_reg(v,r) = n; } while 
(0)
+
 /*
  * CP14 and CP15 live in the same array, as they are backed by the
  * same system registers.
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 9679067a1574..95f46e73c4dc 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -249,7 +249,7 @@ struct kvm;
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-   return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+   return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
 static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index feedb877cff8..db32d10a56a1 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -46,7 +46,8 @@ static DEFINE_PER_CPU(u32, mdcr_el2);
  */
 static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.guest_debug_preserved.mdscr_el1 = vcpu_sys_reg(vcpu, 
MDSCR_EL1);
+

[PATCH v4 21/40] KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions

There is no need to have multiple identical functions with different
names for saving host and guest state.  When saving and restoring state
for the host and guest, the state is the same for both contexts, and
that's why we have the kvm_cpu_context structure.  Delete one
version and rename the other into simply save/restore.

Reviewed-by: Andrew Jones 
Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/include/asm/kvm_hyp.h |  6 ++
 arch/arm64/kvm/hyp/switch.c  | 10 +-
 arch/arm64/kvm/hyp/sysreg-sr.c   | 18 ++
 3 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 23c09d9af343..2b1fda90dde4 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -131,10 +131,8 @@ int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
 void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
-void __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt);
 void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt);
 void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt);
 void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index a99224996e33..22e77deb8e2e 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -429,7 +429,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = >arch.ctxt;
 
-   __sysreg_save_host_state_nvhe(host_ctxt);
+   __sysreg_save_state_nvhe(host_ctxt);
 
__activate_traps(vcpu);
__activate_vm(kern_hyp_va(vcpu->kvm));
@@ -442,7 +442,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 */
__sysreg32_restore_state(vcpu);
-   __sysreg_restore_guest_state_nvhe(guest_ctxt);
+   __sysreg_restore_state_nvhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
 
do {
@@ -462,7 +462,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 
fp_enabled = __fpsimd_enabled();
 
-   __sysreg_save_guest_state_nvhe(guest_ctxt);
+   __sysreg_save_state_nvhe(guest_ctxt);
__sysreg32_save_state(vcpu);
__timer_disable_traps(vcpu);
__vgic_save_state(vcpu);
@@ -470,7 +470,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
 
-   __sysreg_restore_host_state_nvhe(host_ctxt);
+   __sysreg_restore_state_nvhe(host_ctxt);
 
if (fp_enabled) {
__fpsimd_save_state(_ctxt->gp_regs.fp_regs);
@@ -500,7 +500,7 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 
elr, u64 par,
__timer_disable_traps(vcpu);
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
-   __sysreg_restore_host_state_nvhe(__host_ctxt);
+   __sysreg_restore_state_nvhe(__host_ctxt);
}
 
/*
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 18801ab56e8b..d35b3aa680ab 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -78,14 +78,7 @@ static void __hyp_text __sysreg_save_el1_state(struct 
kvm_cpu_context *ctxt)
ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
 }
 
-void __hyp_text __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt)
-{
-   __sysreg_save_el1_state(ctxt);
-   __sysreg_save_common_state(ctxt);
-   __sysreg_save_user_state(ctxt);
-}
-
-void __hyp_text __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 {
__sysreg_save_el1_state(ctxt);
__sysreg_save_common_state(ctxt);
@@ -154,14 +147,7 @@ static void __hyp_text __sysreg_restore_el1_state(struct 
kvm_cpu_context *ctxt)
write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
 }
 
-void __hyp_text __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt)
-{
-   __sysreg_restore_el1_state(ctxt);
-   __sysreg_restore_common_state(ctxt);
-   __sysreg_restore_user_state(ctxt);
-}
-
-void __hyp_text __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 {
__sysreg_restore_el1_state(ctxt);

[PATCH v4 20/40] KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe

The comment only applied to SPE on non-VHE systems, so we simply remove
it.

Suggested-by: Andrew Jones 
Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/kvm/hyp/switch.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 717c8f5c6be7..a99224996e33 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -410,10 +410,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
}
 
-   /*
-* This must come after restoring the host sysregs, since a non-VHE
-* system may enable SPE here and make use of the TTBRs.
-*/
__debug_switch_to_host(vcpu);
 
return exit_code;
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 19/40] KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore functions

As we are about to handle system registers quite differently between VHE
and non-VHE systems.  In preparation for that, we need to split some of
the handling functions between VHE and non-VHE functionality.

For now, we simply copy the non-VHE functions, but we do change the use
of static keys for VHE and non-VHE functionality now that we have
separate functions.

Reviewed-by: Andrew Jones 
Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/include/asm/kvm_hyp.h | 12 
 arch/arm64/kvm/hyp/switch.c  | 20 ++--
 arch/arm64/kvm/hyp/sysreg-sr.c   | 40 
 3 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index aeda2a777365..23c09d9af343 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -131,10 +131,14 @@ int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
 void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
-void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
-void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
+void __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
 void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index f0fae320edc0..717c8f5c6be7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -373,7 +373,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = >arch.ctxt;
 
-   __sysreg_save_host_state(host_ctxt);
+   sysreg_save_host_state_vhe(host_ctxt);
 
__activate_traps(vcpu);
__activate_vm(vcpu->kvm);
@@ -385,7 +385,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 */
__sysreg32_restore_state(vcpu);
-   __sysreg_restore_guest_state(guest_ctxt);
+   sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
 
do {
@@ -397,13 +397,13 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
fp_enabled = __fpsimd_enabled();
 
-   __sysreg_save_guest_state(guest_ctxt);
+   sysreg_save_guest_state_vhe(guest_ctxt);
__sysreg32_save_state(vcpu);
__vgic_save_state(vcpu);
 
__deactivate_traps(vcpu);
 
-   __sysreg_restore_host_state(host_ctxt);
+   sysreg_restore_host_state_vhe(host_ctxt);
 
if (fp_enabled) {
__fpsimd_save_state(_ctxt->gp_regs.fp_regs);
@@ -433,7 +433,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = >arch.ctxt;
 
-   __sysreg_save_host_state(host_ctxt);
+   __sysreg_save_host_state_nvhe(host_ctxt);
 
__activate_traps(vcpu);
__activate_vm(kern_hyp_va(vcpu->kvm));
@@ -446,7 +446,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 */
__sysreg32_restore_state(vcpu);
-   __sysreg_restore_guest_state(guest_ctxt);
+   __sysreg_restore_guest_state_nvhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
 
do {
@@ -466,7 +466,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 
fp_enabled = __fpsimd_enabled();
 
-   __sysreg_save_guest_state(guest_ctxt);
+   __sysreg_save_guest_state_nvhe(guest_ctxt);
__sysreg32_save_state(vcpu);
__timer_disable_traps(vcpu);
__vgic_save_state(vcpu);
@@ -474,7 +474,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
 
-   __sysreg_restore_host_state(host_ctxt);
+   __sysreg_restore_host_state_nvhe(host_ctxt);
 
if (fp_enabled) {
__fpsimd_save_state(_ctxt->gp_regs.fp_regs);
@@ -504,7 +504,7 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 
elr, u64 par,
__timer_disable_traps(vcpu);
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
-

[PATCH v4 18/40] KVM: arm64: Rewrite sysreg alternatives to static keys

As we are about to move calls around in the sysreg save/restore logic,
let's first rewrite the alternative function callers, because it is
going to make the next patches much easier to read.

Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index d5a5145b4e7c..51b557226170 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -22,9 +22,6 @@
 #include 
 #include 
 
-/* Yes, this does nothing, on purpose */
-static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
-
 /*
  * Non-VHE: Both host and guest must save everything.
  *
@@ -81,13 +78,10 @@ static void __hyp_text __sysreg_save_el1_state(struct 
kvm_cpu_context *ctxt)
ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
 }
 
-static hyp_alternate_select(__sysreg_call_save_host_state,
-   __sysreg_save_el1_state, __sysreg_do_nothing,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
 {
-   __sysreg_call_save_host_state()(ctxt);
+   if (!has_vhe())
+   __sysreg_save_el1_state(ctxt);
__sysreg_save_common_state(ctxt);
__sysreg_save_user_state(ctxt);
 }
@@ -148,13 +142,10 @@ static void __hyp_text __sysreg_restore_el1_state(struct 
kvm_cpu_context *ctxt)
write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
 }
 
-static hyp_alternate_select(__sysreg_call_restore_host_state,
-   __sysreg_restore_el1_state, __sysreg_do_nothing,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
 {
-   __sysreg_call_restore_host_state()(ctxt);
+   if (!has_vhe())
+   __sysreg_restore_el1_state(ctxt);
__sysreg_restore_common_state(ctxt);
__sysreg_restore_user_state(ctxt);
 }
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 17/40] KVM: arm64: Move userspace system registers into separate function

There's a semantic difference between the EL1 registers that control
operation of a kernel running in EL1 and EL1 registers that only control
userspace execution in EL0.  Since we can defer saving/restoring the
latter, move them into their own function.

ACTLR_EL1 is not used by a VHE host, so we can move this register into
the EL1 state which is not saved/restored for a VHE host.

We also take this chance to rename the function saving/restoring the
remaining system register to make it clear this function deals with
the EL1 system registers.

Reviewed-by: Andrew Jones 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Correct the comment about ACTLR_EL1 and adjust commit text.

Changes since v2:
 - Save restore ACTLR_EL1 as part of the EL1 registers state instead of
   the user register state, as ACTLR_EL1 can't affect the host's execution
   on VHE systems.

Changes since v1:
 - Added comment about sp_el0 to common save sysreg save/restore functions

 arch/arm64/kvm/hyp/sysreg-sr.c | 48 ++
 1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 99fc60516103..d5a5145b4e7c 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -28,24 +28,33 @@ static void __hyp_text __sysreg_do_nothing(struct 
kvm_cpu_context *ctxt) { }
 /*
  * Non-VHE: Both host and guest must save everything.
  *
- * VHE: Host must save tpidr*_el0, actlr_el1, mdscr_el1, sp_el0,
+ * VHE: Host must save tpidr*_el0, mdscr_el1, sp_el0,
  * and guest must save everything.
  */
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
-   ctxt->sys_regs[ACTLR_EL1]   = read_sysreg(actlr_el1);
-   ctxt->sys_regs[TPIDR_EL0]   = read_sysreg(tpidr_el0);
-   ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
ctxt->sys_regs[MDSCR_EL1]   = read_sysreg(mdscr_el1);
+
+   /*
+* The host arm64 Linux uses sp_el0 to point to 'current' and it must
+* therefore be saved/restored on every entry/exit to/from the guest.
+*/
ctxt->gp_regs.regs.sp   = read_sysreg(sp_el0);
 }
 
-static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
+{
+   ctxt->sys_regs[TPIDR_EL0]   = read_sysreg(tpidr_el0);
+   ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
+}
+
+static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
ctxt->sys_regs[MPIDR_EL1]   = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1]  = read_sysreg(csselr_el1);
ctxt->sys_regs[SCTLR_EL1]   = read_sysreg_el1(sctlr);
+   ctxt->sys_regs[ACTLR_EL1]   = read_sysreg(actlr_el1);
ctxt->sys_regs[CPACR_EL1]   = read_sysreg_el1(cpacr);
ctxt->sys_regs[TTBR0_EL1]   = read_sysreg_el1(ttbr0);
ctxt->sys_regs[TTBR1_EL1]   = read_sysreg_el1(ttbr1);
@@ -73,35 +82,46 @@ static void __hyp_text __sysreg_save_state(struct 
kvm_cpu_context *ctxt)
 }
 
 static hyp_alternate_select(__sysreg_call_save_host_state,
-   __sysreg_save_state, __sysreg_do_nothing,
+   __sysreg_save_el1_state, __sysreg_do_nothing,
ARM64_HAS_VIRT_HOST_EXTN);
 
 void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
 {
__sysreg_call_save_host_state()(ctxt);
__sysreg_save_common_state(ctxt);
+   __sysreg_save_user_state(ctxt);
 }
 
 void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
 {
-   __sysreg_save_state(ctxt);
+   __sysreg_save_el1_state(ctxt);
__sysreg_save_common_state(ctxt);
+   __sysreg_save_user_state(ctxt);
 }
 
 static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context 
*ctxt)
 {
-   write_sysreg(ctxt->sys_regs[ACTLR_EL1],   actlr_el1);
-   write_sysreg(ctxt->sys_regs[TPIDR_EL0],   tpidr_el0);
-   write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
write_sysreg(ctxt->sys_regs[MDSCR_EL1],   mdscr_el1);
+
+   /*
+* The host arm64 Linux uses sp_el0 to point to 'current' and it must
+* therefore be saved/restored on every entry/exit to/from the guest.
+*/
write_sysreg(ctxt->gp_regs.regs.sp,   sp_el0);
 }
 
-static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context 
*ctxt)
+{
+   write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
+   write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],   tpidrro_el0);
+}
+
+static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 {
write_sysreg(ctxt->sys_regs[MPIDR_EL1],

[PATCH v4 16/40] KVM: arm64: Remove noop calls to timer save/restore from VHE switch

The VHE switch function calls __timer_enable_traps and
__timer_disable_traps which don't do anything on VHE systems.
Therefore, simply remove these calls from the VHE switch function and
make the functions non-conditional as they are now only called from the
non-VHE switch path.

Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v2:
 - Added comment explaining the timer enable/disable functions
   are for !VHE only.

 arch/arm64/kvm/hyp/switch.c |  2 --
 virt/kvm/arm/hyp/timer-sr.c | 44 ++--
 2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 734dac7f808f..f0fae320edc0 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -379,7 +379,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__activate_vm(vcpu->kvm);
 
__vgic_restore_state(vcpu);
-   __timer_enable_traps(vcpu);
 
/*
 * We must restore the 32-bit state before the sysregs, thanks
@@ -400,7 +399,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
__sysreg_save_guest_state(guest_ctxt);
__sysreg32_save_state(vcpu);
-   __timer_disable_traps(vcpu);
__vgic_save_state(vcpu);
 
__deactivate_traps(vcpu);
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index f24404b3c8df..77754a62eb0c 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, 
u32 cntvoff_high)
write_sysreg(cntvoff, cntvoff_el2);
 }
 
+/*
+ * Should only be called on non-VHE systems.
+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
+ */
 void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
 {
-   /*
-* We don't need to do this for VHE since the host kernel runs in EL2
-* with HCR_EL2.TGE ==1, which makes those bits have no impact.
-*/
-   if (!has_vhe()) {
-   u64 val;
+   u64 val;
 
-   /* Allow physical timer/counter access for the host */
-   val = read_sysreg(cnthctl_el2);
-   val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
-   write_sysreg(val, cnthctl_el2);
-   }
+   /* Allow physical timer/counter access for the host */
+   val = read_sysreg(cnthctl_el2);
+   val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
+   write_sysreg(val, cnthctl_el2);
 }
 
+/*
+ * Should only be called on non-VHE systems.
+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
+ */
 void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-   if (!has_vhe()) {
-   u64 val;
+   u64 val;
 
-   /*
-* Disallow physical timer access for the guest
-* Physical counter access is allowed
-*/
-   val = read_sysreg(cnthctl_el2);
-   val &= ~CNTHCTL_EL1PCEN;
-   val |= CNTHCTL_EL1PCTEN;
-   write_sysreg(val, cnthctl_el2);
-   }
+   /*
+* Disallow physical timer access for the guest
+* Physical counter access is allowed
+*/
+   val = read_sysreg(cnthctl_el2);
+   val &= ~CNTHCTL_EL1PCEN;
+   val |= CNTHCTL_EL1PCTEN;
+   write_sysreg(val, cnthctl_el2);
 }
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 15/40] KVM: arm64: Don't deactivate VM on VHE systems

There is no need to reset the VTTBR to zero when exiting the guest on
VHE systems.  VHE systems don't use stage 2 translations for the EL2&0
translation regime used by the host.

Reviewed-by: Andrew Jones 
Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v1:
 - Changed __activate_vm to take a kvm pointer
 - No longer adding inline attributes to functions

 arch/arm64/kvm/hyp/switch.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 85dae7b94a0f..734dac7f808f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -156,9 +156,8 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu 
*vcpu)
write_sysreg(0, pmuserenr_el0);
 }
 
-static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
+static void __hyp_text __activate_vm(struct kvm *kvm)
 {
-   struct kvm *kvm = kern_hyp_va(vcpu->kvm);
write_sysreg(kvm->arch.vttbr, vttbr_el2);
 }
 
@@ -377,7 +376,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__sysreg_save_host_state(host_ctxt);
 
__activate_traps(vcpu);
-   __activate_vm(vcpu);
+   __activate_vm(vcpu->kvm);
 
__vgic_restore_state(vcpu);
__timer_enable_traps(vcpu);
@@ -405,7 +404,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__vgic_save_state(vcpu);
 
__deactivate_traps(vcpu);
-   __deactivate_vm(vcpu);
 
__sysreg_restore_host_state(host_ctxt);
 
@@ -440,7 +438,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
__sysreg_save_host_state(host_ctxt);
 
__activate_traps(vcpu);
-   __activate_vm(vcpu);
+   __activate_vm(kern_hyp_va(vcpu->kvm));
 
__vgic_restore_state(vcpu);
__timer_enable_traps(vcpu);
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 12/40] KVM: arm64: Factor out fault info population and gic workarounds

The current world-switch function has functionality to detect a number
of cases where we need to fixup some part of the exit condition and
possibly run the guest again, before having restored the host state.

This includes populating missing fault info, emulating GICv2 CPU
interface accesses when mapped at unaligned addresses, and emulating
the GICv3 CPU interface on systems that need it.

As we are about to have an alternative switch function for VHE systems,
but VHE systems still need the same early fixup logic, factor out this
logic into a separate function that can be shared by both switch
functions.

No functional change.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/kvm/hyp/switch.c | 104 
 1 file changed, 57 insertions(+), 47 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 0c80d6a577ca..d2c0b1ae3216 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -291,53 +291,27 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
}
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+/*
+ * Return true when we were able to fixup the guest exit and should return to
+ * the guest, false when we should restore the host state and return to the
+ * main run loop.
+ */
+static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
-   struct kvm_cpu_context *host_ctxt;
-   struct kvm_cpu_context *guest_ctxt;
-   bool fp_enabled;
-   u64 exit_code;
-
-   vcpu = kern_hyp_va(vcpu);
-
-   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-   host_ctxt->__hyp_running_vcpu = vcpu;
-   guest_ctxt = >arch.ctxt;
-
-   __sysreg_save_host_state(host_ctxt);
-
-   __activate_traps(vcpu);
-   __activate_vm(vcpu);
-
-   __vgic_restore_state(vcpu);
-   __timer_enable_traps(vcpu);
-
-   /*
-* We must restore the 32-bit state before the sysregs, thanks
-* to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
-*/
-   __sysreg32_restore_state(vcpu);
-   __sysreg_restore_guest_state(guest_ctxt);
-   __debug_switch_to_guest(vcpu);
-
-   /* Jump in the fire! */
-again:
-   exit_code = __guest_enter(vcpu, host_ctxt);
-   /* And we're baaack! */
-
-   if (ARM_EXCEPTION_CODE(exit_code) != ARM_EXCEPTION_IRQ)
+   if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
+
/*
 * We're using the raw exception code in order to only process
 * the trap if no SError is pending. We will come back to the
 * same PC once the SError has been injected, and replay the
 * trapping instruction.
 */
-   if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
-   goto again;
+   if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
+   return true;
 
if (static_branch_unlikely(_v2_cpuif_trap) &&
-   exit_code == ARM_EXCEPTION_TRAP) {
+   *exit_code == ARM_EXCEPTION_TRAP) {
bool valid;
 
valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
@@ -351,9 +325,9 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
if (ret == 1) {
if (__skip_instr(vcpu))
-   goto again;
+   return true;
else
-   exit_code = ARM_EXCEPTION_TRAP;
+   *exit_code = ARM_EXCEPTION_TRAP;
}
 
if (ret == -1) {
@@ -365,29 +339,65 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 */
if (!__skip_instr(vcpu))
*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
-   exit_code = ARM_EXCEPTION_EL1_SERROR;
+   *exit_code = ARM_EXCEPTION_EL1_SERROR;
}
-
-   /* 0 falls through to be handler out of EL2 */
}
}
 
if (static_branch_unlikely(_v3_cpuif_trap) &&
-   exit_code == ARM_EXCEPTION_TRAP &&
+   *exit_code == ARM_EXCEPTION_TRAP &&
(kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
 kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
int ret = __vgic_v3_perform_cpuif_access(vcpu);
 
if (ret == 1) {
if (__skip_instr(vcpu))
-   goto again;
+   return true;
else
-   exit_code = ARM_EXCEPTION_TRAP;
+

[PATCH v4 14/40] KVM: arm64: Remove kern_hyp_va() use in VHE switch function

VHE kernels run completely in EL2 and therefore don't have a notion of
kernel and hyp addresses, they are all just kernel addresses.  Therefore
don't call kern_hyp_va() in the VHE switch function.

Reviewed-by: Andrew Jones 
Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/kvm/hyp/switch.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index b6126af539b6..85dae7b94a0f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -370,9 +370,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
bool fp_enabled;
u64 exit_code;
 
-   vcpu = kern_hyp_va(vcpu);
-
-   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+   host_ctxt = vcpu->arch.host_cpu_context;
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = >arch.ctxt;
 
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 13/40] KVM: arm64: Introduce VHE-specific kvm_vcpu_run

So far this is mostly (see below) a copy of the legacy non-VHE switch
function, but we will start reworking these functions in separate
directions to work on VHE and non-VHE in the most optimal way in later
patches.

The only difference after this patch between the VHE and non-VHE run
functions is that we omit the branch-predictor variant-2 hardening for
QC Falkor CPUs, because this workaround is specific to a series of
non-VHE ARMv8.0 CPUs.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Added BUG() to 32-bit ARM VHE run function
 - Omitted QC Falkor BP Hardening functionality from VHE-specific
   function

Changes since v2:
 - Reworded commit message

Changes since v1:
 - Rename kvm_vcpu_run to kvm_vcpu_run_vhe and rename __kvm_vcpu_run to
   __kvm_vcpu_run_nvhe
 - Removed stray whitespace line

 arch/arm/include/asm/kvm_asm.h   |  5 ++-
 arch/arm/kvm/hyp/switch.c|  2 +-
 arch/arm64/include/asm/kvm_asm.h |  4 ++-
 arch/arm64/kvm/hyp/switch.c  | 66 +++-
 virt/kvm/arm/arm.c   |  5 ++-
 5 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 36dd2962a42d..5a953ecb0d78 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu 
*vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+/* no VHE on 32-bit :( */
+static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { BUG(); return 0; }
+
+extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
 
 extern void __init_stage2_translation(void);
 
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index e86679daddff..aac025783ee8 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -154,7 +154,7 @@ static bool __hyp_text __populate_fault_info(struct 
kvm_vcpu *vcpu)
return true;
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 6b626750b0a1..0be2747a6c5f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
+
+extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
 
 extern u64 __vgic_v3_get_ich_vtr_el2(void);
 extern u64 __vgic_v3_read_vmcr(void);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d2c0b1ae3216..b6126af539b6 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -362,7 +362,71 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu 
*vcpu, u64 *exit_code)
return false;
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+/* Switch to the guest for VHE systems running in EL2 */
+int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpu_context *host_ctxt;
+   struct kvm_cpu_context *guest_ctxt;
+   bool fp_enabled;
+   u64 exit_code;
+
+   vcpu = kern_hyp_va(vcpu);
+
+   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+   host_ctxt->__hyp_running_vcpu = vcpu;
+   guest_ctxt = >arch.ctxt;
+
+   __sysreg_save_host_state(host_ctxt);
+
+   __activate_traps(vcpu);
+   __activate_vm(vcpu);
+
+   __vgic_restore_state(vcpu);
+   __timer_enable_traps(vcpu);
+
+   /*
+* We must restore the 32-bit state before the sysregs, thanks
+* to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
+*/
+   __sysreg32_restore_state(vcpu);
+   __sysreg_restore_guest_state(guest_ctxt);
+   __debug_switch_to_guest(vcpu);
+
+   do {
+   /* Jump in the fire! */
+   exit_code = __guest_enter(vcpu, host_ctxt);
+
+   /* And we're baaack! */
+   } while (fixup_guest_exit(vcpu, _code));
+
+   fp_enabled = __fpsimd_enabled();
+
+   __sysreg_save_guest_state(guest_ctxt);
+   __sysreg32_save_state(vcpu);
+   __timer_disable_traps(vcpu);
+   __vgic_save_state(vcpu);
+
+   __deactivate_traps(vcpu);
+   __deactivate_vm(vcpu);
+
+   __sysreg_restore_host_state(host_ctxt);
+
+   if (fp_enabled) {
+   __fpsimd_save_state(_ctxt->gp_regs.fp_regs);
+   __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
+   }
+
+   /*
+* This must come after restoring the host sysregs, since a

[PATCH v4 10/40] KVM: arm64: Slightly improve debug save/restore functions

The debug save/restore functions can be improved by using the has_vhe()
static key instead of the instruction alternative.  Using the static key
uses the same paradigm as we're going to use elsewhere, it makes the
code more readable, and it generates slightly better code (no
stack setups and function calls unless necessary).

We also use a static key on the restore path, because it will be
marginally faster than loading a value from memory.

Finally, we don't have to conditionally clear the debug dirty flag if
it's set, we can just clear it.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v1:
 - Change dot to comma in comment
 - Rename __debug_restore_spe to __debug_restore_spe_nvhe

 arch/arm64/kvm/hyp/debug-sr.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index d958cd63a547..74f71fb5e36d 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -66,11 +66,6 @@
default:write_debug(ptr[0], reg, 0);\
}
 
-static void __hyp_text __debug_save_spe_vhe(u64 *pmscr_el1)
-{
-   /* The vcpu can run. but it can't hide. */
-}
-
 static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
 {
u64 reg;
@@ -103,11 +98,7 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
dsb(nsh);
 }
 
-static hyp_alternate_select(__debug_save_spe,
-   __debug_save_spe_nvhe, __debug_save_spe_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
-static void __hyp_text __debug_restore_spe(u64 pmscr_el1)
+static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
 {
if (!pmscr_el1)
return;
@@ -168,17 +159,24 @@ void __hyp_text __debug_cond_save_host_state(struct 
kvm_vcpu *vcpu)
 {
__debug_save_state(vcpu, >arch.host_debug_state.regs,
   kern_hyp_va(vcpu->arch.host_cpu_context));
-   __debug_save_spe()(>arch.host_debug_state.pmscr_el1);
+
+   /*
+* Non-VHE: Disable and flush SPE data generation
+* VHE: The vcpu can run, but it can't hide.
+*/
+   if (!has_vhe())
+   __debug_save_spe_nvhe(>arch.host_debug_state.pmscr_el1);
 }
 
 void __hyp_text __debug_cond_restore_host_state(struct kvm_vcpu *vcpu)
 {
-   __debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+   if (!has_vhe())
+   __debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
+
__debug_restore_state(vcpu, >arch.host_debug_state.regs,
  kern_hyp_va(vcpu->arch.host_cpu_context));
 
-   if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
-   vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
+   vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
 
 u32 __hyp_text __kvm_get_mdcr_el2(void)
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 11/40] KVM: arm64: Improve debug register save/restore flow

Instead of having multiple calls from the world switch path to the debug
logic, each figuring out if the dirty bit is set and if we should
save/restore the debug registers, let's just provide two hooks to the
debug save/restore functionality, one for switching to the guest
context, and one for switching to the host context, and we get the
benefit of only having to evaluate the dirty flag once on each path,
plus we give the compiler some more room to inline some of this
functionality.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v1:
 - Remove leading underscores from local variables

 arch/arm64/include/asm/kvm_hyp.h | 10 ++-
 arch/arm64/kvm/hyp/debug-sr.c| 56 +++-
 arch/arm64/kvm/hyp/switch.c  |  6 ++---
 3 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index f26f9cd70c72..aeda2a777365 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -138,14 +138,8 @@ void __sysreg_restore_guest_state(struct kvm_cpu_context 
*ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
 void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
 
-void __debug_save_state(struct kvm_vcpu *vcpu,
-   struct kvm_guest_debug_arch *dbg,
-   struct kvm_cpu_context *ctxt);
-void __debug_restore_state(struct kvm_vcpu *vcpu,
-  struct kvm_guest_debug_arch *dbg,
-  struct kvm_cpu_context *ctxt);
-void __debug_cond_save_host_state(struct kvm_vcpu *vcpu);
-void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);
+void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
+void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 74f71fb5e36d..3e717f66f011 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -110,16 +110,13 @@ static void __hyp_text __debug_restore_spe_nvhe(u64 
pmscr_el1)
write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
 }
 
-void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
-  struct kvm_guest_debug_arch *dbg,
-  struct kvm_cpu_context *ctxt)
+static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
+ struct kvm_guest_debug_arch *dbg,
+ struct kvm_cpu_context *ctxt)
 {
u64 aa64dfr0;
int brps, wrps;
 
-   if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
-   return;
-
aa64dfr0 = read_sysreg(id_aa64dfr0_el1);
brps = (aa64dfr0 >> 12) & 0xf;
wrps = (aa64dfr0 >> 20) & 0xf;
@@ -132,16 +129,13 @@ void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
ctxt->sys_regs[MDCCINT_EL1] = read_sysreg(mdccint_el1);
 }
 
-void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
- struct kvm_guest_debug_arch *dbg,
- struct kvm_cpu_context *ctxt)
+static void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
+struct kvm_guest_debug_arch *dbg,
+struct kvm_cpu_context *ctxt)
 {
u64 aa64dfr0;
int brps, wrps;
 
-   if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
-   return;
-
aa64dfr0 = read_sysreg(id_aa64dfr0_el1);
 
brps = (aa64dfr0 >> 12) & 0xf;
@@ -155,10 +149,12 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu 
*vcpu,
write_sysreg(ctxt->sys_regs[MDCCINT_EL1], mdccint_el1);
 }
 
-void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 {
-   __debug_save_state(vcpu, >arch.host_debug_state.regs,
-  kern_hyp_va(vcpu->arch.host_cpu_context));
+   struct kvm_cpu_context *host_ctxt;
+   struct kvm_cpu_context *guest_ctxt;
+   struct kvm_guest_debug_arch *host_dbg;
+   struct kvm_guest_debug_arch *guest_dbg;
 
/*
 * Non-VHE: Disable and flush SPE data generation
@@ -166,15 +162,39 @@ void __hyp_text __debug_cond_save_host_state(struct 
kvm_vcpu *vcpu)
 */
if (!has_vhe())
__debug_save_spe_nvhe(>arch.host_debug_state.pmscr_el1);
+
+   if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+   return;
+
+   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+   guest_ctxt = >arch.ctxt;
+   host_dbg = >arch.host_debug_state.regs;
+   guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
+
+

[PATCH v4 08/40] KVM: arm/arm64: Introduce vcpu_el1_is_32bit

We have numerous checks around that checks if the HCR_EL2 has the RW bit
set to figure out if we're running an AArch64 or AArch32 VM.  In some
cases, directly checking the RW bit (given its unintuitive name), is a
bit confusing, and that's not going to improve as we move logic around
for the following patches that optimize KVM on AArch64 hosts with VHE.

Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
direct checks of HCR_EL2.RW with the helper.

Reviewed-by: Julien Grall 
Reviewed-by: Julien Thierry 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v2:
 - New patch

Changes since v1:
 - Reworded comments as suggested by Drew

 arch/arm64/include/asm/kvm_emulate.h |  7 ++-
 arch/arm64/kvm/hyp/switch.c  | 11 +--
 arch/arm64/kvm/hyp/sysreg-sr.c   |  5 +++--
 arch/arm64/kvm/inject_fault.c|  6 +++---
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 9ee316b962c8..3cc535591bdf 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -45,6 +45,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 
+static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
+{
+   return !(vcpu->arch.hcr_el2 & HCR_RW);
+}
+
 static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 {
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
@@ -65,7 +70,7 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 * For now this is conditional, since no AArch32 feature regs
 * are currently virtualised.
 */
-   if (vcpu->arch.hcr_el2 & HCR_RW)
+   if (!vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 |= HCR_TID3;
 }
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index b51638490d85..fbab9752a9f4 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -74,7 +74,7 @@ static hyp_alternate_select(__activate_traps_arch,
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
-   u64 val;
+   u64 hcr = vcpu->arch.hcr_el2;
 
/*
 * We are about to set CPTR_EL2.TFP to trap all floating point
@@ -85,17 +85,16 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
*vcpu)
 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
 * it will cause an exception.
 */
-   val = vcpu->arch.hcr_el2;
-
-   if (!(val & HCR_RW) && system_supports_fpsimd()) {
+   if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
write_sysreg(1 << 30, fpexc32_el2);
isb();
}
-   write_sysreg(val, hcr_el2);
 
-   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (val & HCR_VSE))
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
 
+   write_sysreg(hcr, hcr_el2);
+
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
/*
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 434f0fc9cfb3..99fc60516103 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -19,6 +19,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 /* Yes, this does nothing, on purpose */
@@ -147,7 +148,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 {
u64 *spsr, *sysreg;
 
-   if (read_sysreg(hcr_el2) & HCR_RW)
+   if (!vcpu_el1_is_32bit(vcpu))
return;
 
spsr = vcpu->arch.ctxt.gp_regs.spsr;
@@ -172,7 +173,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu 
*vcpu)
 {
u64 *spsr, *sysreg;
 
-   if (read_sysreg(hcr_el2) & HCR_RW)
+   if (!vcpu_el1_is_32bit(vcpu))
return;
 
spsr = vcpu->arch.ctxt.gp_regs.spsr;
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index c1e179d34e6a..30a3f58cdb7b 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -128,7 +128,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
 {
-   if (!(vcpu->arch.hcr_el2 & HCR_RW))
+   if (vcpu_el1_is_32bit(vcpu))
kvm_inject_dabt32(vcpu, addr);
else
inject_abt64(vcpu, false, addr);
@@ -144,7 +144,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long 
addr)
  */
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
 {
-   if (!(vcpu->arch.hcr_el2 & HCR_RW))
+   if (vcpu_el1_is_32bit(vcpu))
kvm_inject_pabt32(vcpu, addr);
else

[PATCH v4 09/40] KVM: arm64: Move debug dirty flag calculation out of world switch

There is no need to figure out inside the world-switch if we should
save/restore the debug registers or not, we might as well do that in the
higher level debug setup code, making it easier to optimize down the
line.

Reviewed-by: Julien Thierry 
Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/kvm/debug.c| 5 +
 arch/arm64/kvm/hyp/debug-sr.c | 6 --
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index fa63b28c65e0..feedb877cff8 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -193,6 +193,11 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
if (trap_debug)
vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
 
+   /* If KDE or MDE are set, perform a full save/restore cycle. */
+   if ((vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_KDE) ||
+   (vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_MDE))
+   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+
trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
 }
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index dabb5cc7b087..d958cd63a547 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -166,12 +166,6 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu 
*vcpu,
 
 void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
 {
-   /* If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY is set, perform
-* a full save/restore cycle. */
-   if ((vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE) ||
-   (vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_MDE))
-   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
-
__debug_save_state(vcpu, >arch.host_debug_state.regs,
   kern_hyp_va(vcpu->arch.host_cpu_context));
__debug_save_spe()(>arch.host_debug_state.pmscr_el1);
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 06/40] KVM: arm/arm64: Get rid of vcpu->arch.irq_lines

We currently have a separate read-modify-write of the HCR_EL2 on entry
to the guest for the sole purpose of setting the VF and VI bits, if set.
Since this is most rarely the case (only when using userspace IRQ chip
and interrupts are in flight), let's get rid of this operation and
instead modify the bits in the vcpu->arch.hcr[_el2] directly when
needed.

Acked-by: Marc Zyngier 
Reviewed-by: Andrew Jones 
Reviewed-by: Julien Thierry 
Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_emulate.h   |  9 ++---
 arch/arm/include/asm/kvm_host.h  |  3 ---
 arch/arm/kvm/emulate.c   |  2 +-
 arch/arm/kvm/hyp/switch.c|  2 +-
 arch/arm64/include/asm/kvm_emulate.h |  9 ++---
 arch/arm64/include/asm/kvm_host.h|  3 ---
 arch/arm64/kvm/hyp/switch.c  |  6 --
 arch/arm64/kvm/inject_fault.c|  2 +-
 virt/kvm/arm/arm.c   | 11 ++-
 virt/kvm/arm/mmu.c   |  6 +++---
 10 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index 9003bd19cb70..e27caa4b47a1 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -92,14 +92,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr = HCR_GUEST_MASK;
 }
 
-static inline unsigned long vcpu_get_hcr(const struct kvm_vcpu *vcpu)
+static inline unsigned long *vcpu_hcr(const struct kvm_vcpu *vcpu)
 {
-   return vcpu->arch.hcr;
-}
-
-static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
-{
-   vcpu->arch.hcr = hcr;
+   return (unsigned long *)>arch.hcr;
 }
 
 static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 248b930563e5..6137195ab815 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -155,9 +155,6 @@ struct kvm_vcpu_arch {
/* HYP trapping configuration */
u32 hcr;
 
-   /* Interrupt related fields */
-   u32 irq_lines;  /* IRQ and FIQ levels */
-
/* Exception Information */
struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index cdff963f133a..fa501bf437f3 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -174,5 +174,5 @@ unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-   vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VA);
+   *vcpu_hcr(vcpu) |= HCR_VA;
 }
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index ae45ae96aac2..e86679daddff 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -44,7 +44,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
*vcpu, u32 *fpexc_host)
isb();
}
 
-   write_sysreg(vcpu->arch.hcr | vcpu->arch.irq_lines, HCR);
+   write_sysreg(vcpu->arch.hcr, HCR);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(HSTR_T(15), HSTR);
write_sysreg(HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11), HCPTR);
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 4610bc818097..9ee316b962c8 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -69,14 +69,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 |= HCR_TID3;
 }
 
-static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
+static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
 {
-   return vcpu->arch.hcr_el2;
-}
-
-static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
-{
-   vcpu->arch.hcr_el2 = hcr;
+   return (unsigned long *)>arch.hcr_el2;
 }
 
 static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 618cfee7206a..b027a7f025d4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -272,9 +272,6 @@ struct kvm_vcpu_arch {
/* IO related fields */
struct kvm_decode mmio_decode;
 
-   /* Interrupt related fields */
-   u64 irq_lines;  /* IRQ and FIQ levels */
-
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 53f3e3486a19..b51638490d85 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -178,12 +178,6 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu 
*vcpu)
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-   u64 val;
-
-   val = read_sysreg(hcr_el2);
-   val |=

[PATCH v4 07/40] KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs

As we are about to move a bunch of save/restore logic for VHE kernels to
the load and put functions, we need some infrastructure to do this.

Reviewed-by: Andrew Jones 
Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v1:
 - Reworded comments as suggested by Drew

 arch/arm/include/asm/kvm_host.h   |  3 +++
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/hyp/sysreg-sr.c| 30 ++
 virt/kvm/arm/arm.c|  2 ++
 4 files changed, 38 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6137195ab815..c6a749568dd6 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -312,4 +312,7 @@ static inline bool kvm_arm_harden_branch_predictor(void)
return false;
 }
 
+static inline void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu) {}
+static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index b027a7f025d4..c30fc96992df 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -435,4 +435,7 @@ static inline bool kvm_arm_harden_branch_predictor(void)
return cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR);
 }
 
+void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
+void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 43b7dd65e3e6..434f0fc9cfb3 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -190,6 +190,36 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu 
*vcpu)
write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
 
+/**
+ * kvm_vcpu_load_sysregs - Load guest system registers to the physical CPU
+ *
+ * @vcpu: The VCPU pointer
+ *
+ * Load system registers that do not affect the host's execution, for
+ * example EL1 system registers on a VHE system where the host kernel
+ * runs at EL2.  This function is called from KVM's vcpu_load() function
+ * and loading system register state early avoids having to load them on
+ * every entry to the VM.
+ */
+void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
+{
+}
+
+/**
+ * kvm_vcpu_put_sysregs - Restore host system registers to the physical CPU
+ *
+ * @vcpu: The VCPU pointer
+ *
+ * Save guest system registers that do not affect the host's execution, for
+ * example EL1 system registers on a VHE system where the host kernel
+ * runs at EL2.  This function is called from KVM's vcpu_put() function
+ * and deferring saving system register state until we're no longer running the
+ * VCPU avoids having to save them on every exit from the VM.
+ */
+void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
+{
+}
+
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
 {
asm("msr tpidr_el2, %0": : "r" (tpidr_el2));
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 49d13510e9c2..2062d9357971 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -362,10 +362,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_arm_set_running_vcpu(vcpu);
kvm_vgic_load(vcpu);
kvm_timer_vcpu_load(vcpu);
+   kvm_vcpu_load_sysregs(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   kvm_vcpu_put_sysregs(vcpu);
kvm_timer_vcpu_put(vcpu);
kvm_vgic_put(vcpu);
 
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 05/40] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag

From: Shih-Wei Li 

We always set the IMO and FMO bits in the HCR_EL2 when running the
guest, regardless if we use the vgic or not.  By moving these flags to
HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
HCR_EL2 in the world switch code, and we can also soon get rid of the
other one.

This is safe, because even though the IMO and FMO bits control both
taking the interrupts to EL2 and remapping ICC_*_EL1 to ICV_*_EL1 when
executed at EL1, as long as we ensure that these bits are clear when
running the EL1 host, we're OK, because we reset the HCR_EL2 to only
have the HCR_RW bit set when returning to EL1 on non-VHE systems.

Reviewed-by: Marc Zyngier 
Signed-off-by: Shih-Wei Li 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Slightly reworded the commit message

 arch/arm64/include/asm/kvm_arm.h | 4 ++--
 arch/arm64/kvm/hyp/switch.c  | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index b0c84171e6a3..f055be241f4d 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -81,9 +81,9 @@
  */
 #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
-HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW)
+HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | \
+HCR_FMO | HCR_IMO)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
-#define HCR_INT_OVERRIDE   (HCR_FMO | HCR_IMO)
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 079bb5243f0c..53f3e3486a19 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -174,8 +174,6 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu 
*vcpu)
__vgic_v3_save_state(vcpu);
else
__vgic_v2_save_state(vcpu);
-
-   write_sysreg(read_sysreg(hcr_el2) & ~HCR_INT_OVERRIDE, hcr_el2);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
@@ -183,7 +181,6 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu 
*vcpu)
u64 val;
 
val = read_sysreg(hcr_el2);
-   val |=  HCR_INT_OVERRIDE;
val |= vcpu->arch.irq_lines;
write_sysreg(val, hcr_el2);
 
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 04/40] KVM: arm64: Rework hyp_panic for VHE and non-VHE

VHE actually doesn't rely on clearing the VTTBR when returning to the
host kernel, and that is the current key mechanism of hyp_panic to
figure out how to attempt to return to a state good enough to print a
panic statement.

Therefore, we split the hyp_panic function into two functions, a VHE and
a non-VHE, keeping the non-VHE version intact, but changing the VHE
behavior.

The vttbr_el2 check on VHE doesn't really make that much sense, because
the only situation where we can get here on VHE is when the hypervisor
assembly code actually called into hyp_panic, which only happens when
VBAR_EL2 has been set to the KVM exception vectors.  On VHE, we can
always safely disable the traps and restore the host registers at this
point, so we simply do that unconditionally and call into the panic
function directly.

Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v1:
 - Fixed typos in the commit message
 - Still use the generic __deactivte_traps() function in the hyp panic
   code until we rework that logic later.

 arch/arm64/kvm/hyp/switch.c | 42 +++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d1749fa0bfc3..079bb5243f0c 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -437,10 +437,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx 
ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
 
 static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
-struct kvm_vcpu *vcpu)
+struct kvm_cpu_context 
*__host_ctxt)
 {
+   struct kvm_vcpu *vcpu;
unsigned long str_va;
 
+   vcpu = __host_ctxt->__hyp_running_vcpu;
+
+   if (read_sysreg(vttbr_el2)) {
+   __timer_disable_traps(vcpu);
+   __deactivate_traps(vcpu);
+   __deactivate_vm(vcpu);
+   __sysreg_restore_host_state(__host_ctxt);
+   }
+
/*
 * Force the panic string to be loaded from the literal pool,
 * making sure it is a kernel address and not a PC-relative
@@ -454,37 +464,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, 
u64 elr, u64 par,
   read_sysreg(hpfar_el2), par, vcpu);
 }
 
-static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
-   struct kvm_vcpu *vcpu)
+static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
+struct kvm_cpu_context *host_ctxt)
 {
+   struct kvm_vcpu *vcpu;
+   vcpu = host_ctxt->__hyp_running_vcpu;
+
+   __deactivate_traps(vcpu);
+   __sysreg_restore_host_state(host_ctxt);
+
panic(__hyp_panic_string,
  spsr,  elr,
  read_sysreg_el2(esr),   read_sysreg_el2(far),
  read_sysreg(hpfar_el2), par, vcpu);
 }
 
-static hyp_alternate_select(__hyp_call_panic,
-   __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
 {
-   struct kvm_vcpu *vcpu = NULL;
-
u64 spsr = read_sysreg_el2(spsr);
u64 elr = read_sysreg_el2(elr);
u64 par = read_sysreg(par_el1);
 
-   if (read_sysreg(vttbr_el2)) {
-   vcpu = host_ctxt->__hyp_running_vcpu;
-   __timer_disable_traps(vcpu);
-   __deactivate_traps(vcpu);
-   __deactivate_vm(vcpu);
-   __sysreg_restore_host_state(host_ctxt);
-   }
-
-   /* Call panic for real */
-   __hyp_call_panic()(spsr, elr, par, vcpu);
+   if (!has_vhe())
+   __hyp_call_panic_nvhe(spsr, elr, par, host_ctxt);
+   else
+   __hyp_call_panic_vhe(spsr, elr, par, host_ctxt);
 
unreachable();
 }
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 03/40] KVM: arm64: Avoid storing the vcpu pointer on the stack

We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch.  We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.

This does require us to calculate the percpu offset without including
the offset from the kernel mapping of the percpu array to the linear
mapping of the array (which is what we store in tpidr_el1), because a
PC-relative generated address in EL2 is already giving us the hyp alias
of the linear mapping of a kernel address.  We do this in
__cpu_init_hyp_mode() by using kvm_ksym_ref().

This change also requires us to have a scratch register, so we take the
chance to rearrange some of the el1_sync code to only look at the
vttbr_el2 to determine if this is a trap from the guest or an HVC from
the host.  We do add an extra check to call the panic code if the kernel
is configured with debugging enabled and we saw a trap from the host
which wasn't an HVC, indicating that we left some EL2 trap configured by
mistake.

The code that accesses ESR_EL2 was previously using an alternative to
use the _EL1 accessor on VHE systems, but this was actually unnecessary
as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
accessor does the same thing on both systems.

Cc: Ard Biesheuvel 
Signed-off-by: Christoffer Dall 
---

Notes:
Changes since v3:
 - Reworked the assembly part of the patch after rebasing on v4.16-rc1
   which created a conflict with the variant 2 mitigations.
 - Removed Marc's reviewed-by due to the rework.
 - Removed unneeded extern keyword in declaration in header file

Changes since v1:
 - Use PC-relative addressing to access per-cpu variables instead of
   using a load from the literal pool.
 - Remove stale comments as pointed out by Marc
 - Reworded the commit message as suggested by Drew

 arch/arm64/include/asm/kvm_asm.h  | 14 ++
 arch/arm64/include/asm/kvm_host.h | 15 +++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp/entry.S|  6 +-
 arch/arm64/kvm/hyp/hyp-entry.S| 31 +--
 arch/arm64/kvm/hyp/switch.c   |  5 +
 arch/arm64/kvm/hyp/sysreg-sr.c|  5 +
 7 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 24961b732e65..6b626750b0a1 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -33,6 +33,7 @@
 #define KVM_ARM64_DEBUG_DIRTY_SHIFT0
 #define KVM_ARM64_DEBUG_DIRTY  (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
 
+/* Translate a kernel address of @sym into its equivalent linear mapping */
 #define kvm_ksym_ref(sym)  \
({  \
void *val =\
@@ -70,6 +71,19 @@ extern u32 __init_stage2_translation(void);
 
 extern void __qcom_hyp_sanitize_btac_predictors(void);
 
+#else /* __ASSEMBLY__ */
+
+.macro get_host_ctxt reg, tmp
+   adr_l   \reg, kvm_host_cpu_state
+   mrs \tmp, tpidr_el2
+   add \reg, \reg, \tmp
+.endm
+
+.macro get_vcpu vcpu, ctxt
+   ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
+   kern_hyp_va \vcpu
+.endm
+
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 596f8e414a4c..618cfee7206a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -358,10 +358,15 @@ int kvm_perf_teardown(void);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
+void __kvm_set_tpidr_el2(u64 tpidr_el2);
+DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
+
 static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
   unsigned long hyp_stack_ptr,
   unsigned long vector_ptr)
 {
+   u64 tpidr_el2;
+
/*
 * Call initialization code, and switch to the full blown HYP code.
 * If the cpucaps haven't been finalized yet, something has gone very
@@ -370,6 +375,16 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 */
BUG_ON(!static_branch_likely(_const_caps_ready));
__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
+
+   /*
+* Calculate the raw per-cpu offset without a translation from the
+* kernel's mapping to the linear mapping, and store it in tpidr_el2
+* so that we can use adr_l to access per-cpu variables in EL2.
+*/
+   tpidr_el2 = (u64)this_cpu_ptr(_host_cpu_state)
+   - (u64)kvm_ksym_ref(kvm_host_cpu_state);
+
+   kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
 }

[PATCH v4 02/40] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init

Moving the call to vcpu_load() in kvm_arch_vcpu_ioctl_run() to after
we've called kvm_vcpu_first_run_init() simplifies some of the vgic and
there is also no need to do vcpu_load() for things such as handling the
immediate_exit flag.

Reviewed-by: Julien Grall 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/arch_timer.c |  4 
 virt/kvm/arm/arm.c| 22 --
 virt/kvm/arm/vgic/vgic-init.c | 11 ---
 3 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 70f4c30918eb..b744e6935a3f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -852,11 +852,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
return ret;
 
 no_vgic:
-   preempt_disable();
timer->enabled = 1;
-   kvm_timer_vcpu_load(vcpu);
-   preempt_enable();
-
return 0;
 }
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 53572304843b..932e61858c55 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -632,27 +632,22 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
if (unlikely(!kvm_vcpu_initialized(vcpu)))
return -ENOEXEC;
 
-   vcpu_load(vcpu);
-
ret = kvm_vcpu_first_run_init(vcpu);
if (ret)
-   goto out;
+   return ret;
 
if (run->exit_reason == KVM_EXIT_MMIO) {
ret = kvm_handle_mmio_return(vcpu, vcpu->run);
if (ret)
-   goto out;
-   if (kvm_arm_handle_step_debug(vcpu, vcpu->run)) {
-   ret = 0;
-   goto out;
-   }
-
+   return ret;
+   if (kvm_arm_handle_step_debug(vcpu, vcpu->run))
+   return 0;
}
 
-   if (run->immediate_exit) {
-   ret = -EINTR;
-   goto out;
-   }
+   if (run->immediate_exit)
+   return -EINTR;
+
+   vcpu_load(vcpu);
 
kvm_sigset_activate(vcpu);
 
@@ -811,7 +806,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
kvm_sigset_deactivate(vcpu);
 
-out:
vcpu_put(vcpu);
return ret;
 }
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 743ca5cb05ef..3e8209a07585 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -302,17 +302,6 @@ int vgic_init(struct kvm *kvm)
 
dist->initialized = true;
 
-   /*
-* If we're initializing GICv2 on-demand when first running the VCPU
-* then we need to load the VGIC state onto the CPU.  We can detect
-* this easily by checking if we are in between vcpu_load and vcpu_put
-* when we just initialized the VGIC.
-*/
-   preempt_disable();
-   vcpu = kvm_arm_get_running_vcpu();
-   if (vcpu)
-   kvm_vgic_load(vcpu);
-   preempt_enable();
 out:
return ret;
 }
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 01/40] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN

Calling vcpu_load() registers preempt notifiers for this vcpu and calls
kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
lifting on arm/arm64 and will try to do things such as enabling the
virtual timer and setting us up to handle interrupts from the timer
hardware.

Loading state onto hardware registers and enabling hardware to signal
interrupts can be problematic when we're not actually about to run the
VCPU, because it makes it difficult to establish the right context when
handling interrupts from the timer, and it makes the register access
code difficult to reason about.

Luckily, now when we call vcpu_load in each ioctl implementation, we can
simply remove the call from the non-KVM_RUN vcpu ioctls, and our
kvm_arch_vcpu_load() is only used for loading vcpu content to the
physical CPU when we're actually going to run the vcpu.

Reviewed-by: Julien Grall 
Signed-off-by: Christoffer Dall 
---
 arch/arm64/kvm/guest.c | 3 ---
 virt/kvm/arm/arm.c | 9 -
 2 files changed, 12 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index d7e3299a7734..959e50d2588c 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -363,8 +363,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
*vcpu,
 {
int ret = 0;
 
-   vcpu_load(vcpu);
-
trace_kvm_set_guest_debug(vcpu, dbg->control);
 
if (dbg->control & ~KVM_GUESTDBG_VALID_MASK) {
@@ -386,7 +384,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
*vcpu,
}
 
 out:
-   vcpu_put(vcpu);
return ret;
 }
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 86941f6181bb..53572304843b 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -384,14 +384,11 @@ static void vcpu_power_off(struct kvm_vcpu *vcpu)
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
 {
-   vcpu_load(vcpu);
-
if (vcpu->arch.power_off)
mp_state->mp_state = KVM_MP_STATE_STOPPED;
else
mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
 
-   vcpu_put(vcpu);
return 0;
 }
 
@@ -400,8 +397,6 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 {
int ret = 0;
 
-   vcpu_load(vcpu);
-
switch (mp_state->mp_state) {
case KVM_MP_STATE_RUNNABLE:
vcpu->arch.power_off = false;
@@ -413,7 +408,6 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
ret = -EINVAL;
}
 
-   vcpu_put(vcpu);
return ret;
 }
 
@@ -1036,8 +1030,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
struct kvm_device_attr attr;
long r;
 
-   vcpu_load(vcpu);
-
switch (ioctl) {
case KVM_ARM_VCPU_INIT: {
struct kvm_vcpu_init init;
@@ -1114,7 +1106,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EINVAL;
}
 
-   vcpu_put(vcpu);
return r;
 }
 
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 00/40] Optimize KVM/ARM for VHE systems

This series redesigns parts of KVM/ARM to optimize the performance on
VHE systems.  The general approach is to try to do as little work as
possible when transitioning between the VM and the hypervisor.  This has
the benefit of lower latency when waiting for interrupts and delivering
virtual interrupts, and reduces the overhead of emulating behavior and
I/O in the host kernel.

Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
that can be generally improved.  We then add infrastructure to move more
logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
registers.

We then introduce a new world-switch function for VHE systems, which we
can tweak and optimize for VHE systems.  To do that, we rework a lot of
the system register save/restore handling and emulation code that may
need access to system registers, so that we can defer as many system
register save/restore operations to vcpu_load and vcpu_put, and move
this logic out of the VHE world switch function.

We then optimize the configuration of traps.  On non-VHE systems, both
the host and VM kernels run in EL1, but because the host kernel should
have full access to the underlying hardware, but the VM kernel should
not, we essentially make the host kernel more privileged than the VM
kernel despite them both running at the same privilege level by enabling
VE traps when entering the VM and disabling those traps when exiting the
VM.  On VHE systems, the host kernel runs in EL2 and has full access to
the hardware (as much as allowed by secure side software), and is
unaffected by the trap configuration.  That means we can configure the
traps for VMs running in EL1 once, and don't have to switch them on and
off for every entry/exit to/from the VM.

Finally, we improve our VGIC handling by moving all save/restore logic
out of the VHE world-switch, and we make it possible to truly only
evaluate if the AP list is empty and not do *any* VGIC work if that is
the case, and only do the minimal amount of work required in the course
of the VGIC processing when we have virtual interrupts in flight.

The patches are based on v4.16-rc1 with a fix for userspace irqchips and a
patch from Dave Martin that moves CPU ID register traps out of the world
switch.

I've given the patches a fair amount of testing on Thunder-X, Mustang,
Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
functionality on TX2.

The patches are also available in the vhe-optimize-v4 branch on my
kernel.org repository [1].  The vhe-optimize-v4-base branch contains
prerequisites of this series.

Changes since v3:
 - Rebased on v4.16-rc1 (fun!)
 - Removed VFP optimizations because it was discovered that the deferred
   approach taken in v3 was buggy
   (https://lists.cs.columbia.edu/pipermail/kvmarm/2018-February/029838.html)
   This causes a fair amount of changes throughout and I've removed
   affected reviewed-by and other tags as best I could to take this into
   account.
 - Used a switch-statement to handle deferred system registers instead
   of the macro approach taken in v3.
 - Addressed other review coments (tried to keep track of this with logs
   of changes in individual patches).
 - I ran Yury's IPI benchark test and other tests on TX2 and could not
   observe a performance regression, but rather an improvement of around
   65%.  I suspect the previous regression was due to the timer WFI
   problem solved for v4.16-rc1.
 - I haven't included Tomasz' reviewed-by, because I figured too much of
   the series has changed since v3.

Changes since v2:
 - Rebased on v4.15-rc3.
 - Includes two additional patches that only does vcpu_load after
   kvm_vcpu_first_run_init and only for KVM_RUN.
 - Addressed review comments from v2 (detailed changelogs are in the
   individual patches).

Thanks,
-Christoffer

[1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git 
vhe-optimize-v4

Christoffer Dall (39):
  KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
  KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
  KVM: arm64: Avoid storing the vcpu pointer on the stack
  KVM: arm64: Rework hyp_panic for VHE and non-VHE
  KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
  KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
  KVM: arm/arm64: Introduce vcpu_el1_is_32bit
  KVM: arm64: Move debug dirty flag calculation out of world switch
  KVM: arm64: Slightly improve debug save/restore functions
  KVM: arm64: Improve debug register save/restore flow
  KVM: arm64: Factor out fault info population and gic workarounds
  KVM: arm64: Introduce VHE-specific kvm_vcpu_run
  KVM: arm64: Remove kern_hyp_va() use in VHE switch function
  KVM: arm64: Don't deactivate VM on VHE systems
  KVM: arm64: Remove noop calls to timer save/restore from VHE switch
  KVM: arm64: Move userspace system registers into separate function
  KVM: arm64: Rewrite sysreg alternatives to static keys
  KVM: arm64: Introduce separate VHE/non-VHE

Re: [PATCH v4 00/17] arm64: Add SMCCC v1.1 support and CVE-2017-5715 (Spectre variant 2) mitigation

2018-02-15 Thread Jon Masters

Hi Marc, all,

On 02/06/2018 12:56 PM, Marc Zyngier wrote:
> ARM has recently published a SMC Calling Convention (SMCCC)
> specification update[1] that provides an optimised calling convention
> and optional, discoverable support for mitigating CVE-2017-5715. ARM
> Trusted Firmware (ATF) has already gained such an implementation[2].

I'm probably just missing something, but does this end up reported
somewhere conveniently user visible? In particular, if the new SMC is
*not* provided, does the user end up easily seeing this?

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 10/11] mm/memory-failure: increase queued recovery work's priority

arm64 can take an NMI-like error notification when user-space steps in
some corrupt memory. APEI's GHES code will call memory_failure_queue()
to schedule the recovery work. We then return to user-space, possibly
taking the fault again.

Currently the arch code unconditionally signals user-space from this
path, so we don't get stuck in this loop, but the affected process
never benefits from memory_failure()s recovery work. To fix this we
need to know the recovery work will run before we get back to user-space.

Increase the priority of the recovery work by scheduling it on the
system_highpri_wq, then try to bump the current task off this CPU
so that the recover work starts immediately.

Reported-by: Xie XiuQi 
Signed-off-by: James Morse 
CC: Xie XiuQi 
CC: gengdongjiu 
---
 mm/memory-failure.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 4b80ccee4535..14f44d841e8b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "internal.h"
@@ -1319,6 +1320,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, 
memory_failure_cpu);
  */
 void memory_failure_queue(unsigned long pfn, int flags)
 {
+   int cpu = smp_processor_id();
struct memory_failure_cpu *mf_cpu;
unsigned long proc_flags;
struct memory_failure_entry entry = {
@@ -1328,11 +1330,14 @@ void memory_failure_queue(unsigned long pfn, int flags)
 
mf_cpu = _cpu_var(memory_failure_cpu);
spin_lock_irqsave(_cpu->lock, proc_flags);
-   if (kfifo_put(_cpu->fifo, entry))
-   schedule_work_on(smp_processor_id(), _cpu->work);
-   else
+   if (kfifo_put(_cpu->fifo, entry)) {
+   queue_work_on(cpu, system_highpri_wq, _cpu->work);
+   set_tsk_need_resched(current);
+   preempt_set_need_resched();
+   } else {
pr_err("Memory failure: buffer overflow when queuing memory 
failure at %#lx\n",
   pfn);
+   }
spin_unlock_irqrestore(_cpu->lock, proc_flags);
put_cpu_var(memory_failure_cpu);
 }
-- 
2.15.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 09/11] ACPI / APEI: Add support for the SDEI GHES Notification type

If the GHES notification type is SDEI, register the provided event
number and point the callback at ghes_sdei_callback().

Signed-off-by: James Morse 
---
 drivers/acpi/apei/ghes.c | 66 ++--
 include/linux/arm_sdei.h |  3 +++
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 280c0a58c700..5eaf0a772be7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -25,6 +25,7 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -59,7 +60,7 @@
 
 #define GHES_PFX   "GHES: "
 
-#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA) || 
defined(CONFIG_ARM_SDE_INTERFACE)
 #define WANT_NMI_ESTATUS_QUEUE 1
 #endif
 
@@ -753,7 +754,7 @@ static int _in_nmi_notify_one(struct ghes *ghes)
return ret;
 }
 
-static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+static int __maybe_unused ghes_estatus_queue_notified(struct list_head 
*rcu_list)
 {
int ret = -ENOENT;
struct ghes *ghes;
@@ -1047,6 +1048,49 @@ static inline void ghes_nmi_add(struct ghes *ghes) { }
 static inline void ghes_nmi_remove(struct ghes *ghes) { }
 #endif /* CONFIG_HAVE_ACPI_APEI_NMI */
 
+static int ghes_sdei_callback(u32 event_num, struct pt_regs *regs, void *arg)
+{
+   struct ghes *ghes = arg;
+
+   if (!_in_nmi_notify_one(ghes)) {
+   if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG))
+   irq_work_queue(_proc_irq_work);
+
+   return 0;
+   }
+
+   return -ENOENT;
+}
+
+static int apei_sdei_register_ghes(struct ghes *ghes)
+{
+   int err = -EINVAL;
+
+   if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+   ghes_estatus_queue_grow_pool(ghes);
+
+   err = sdei_register_ghes(ghes, ghes_sdei_callback);
+   if (err)
+   ghes_estatus_queue_shrink_pool(ghes);
+   }
+
+   return err;
+}
+
+static int apei_sdei_unregister_ghes(struct ghes *ghes)
+{
+   int err = -EINVAL;
+
+   if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+   err = sdei_unregister_ghes(ghes);
+
+   if (!err)
+   ghes_estatus_queue_shrink_pool(ghes);
+   }
+
+   return err;
+}
+
 static int ghes_probe(struct platform_device *ghes_dev)
 {
struct acpi_hest_generic *generic;
@@ -1081,6 +1125,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
goto err;
}
break;
+   case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+   if (!IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+   pr_warn(GHES_PFX "Generic hardware error source: %d 
notified via SDE Interface is not supported!\n",
+   generic->header.source_id);
+   goto err;
+   }
+   break;
case ACPI_HEST_NOTIFY_LOCAL:
pr_warning(GHES_PFX "Generic hardware error source: %d notified 
via local interrupt is not supported!\n",
   generic->header.source_id);
@@ -1154,6 +1205,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_add(ghes);
break;
+   case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+   rc = apei_sdei_register_ghes(ghes);
+   if (rc)
+   goto err_edac_unreg;
+   break;
default:
BUG();
}
@@ -1175,6 +1231,7 @@ err:
 
 static int ghes_remove(struct platform_device *ghes_dev)
 {
+   int rc;
struct ghes *ghes;
struct acpi_hest_generic *generic;
 
@@ -1207,6 +1264,11 @@ static int ghes_remove(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_remove(ghes);
break;
+   case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+   rc = apei_sdei_unregister_ghes(ghes);
+   if (rc)
+   return rc;
+   break;
default:
BUG();
break;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 5fdf799be026..f49063ca206d 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -12,7 +12,10 @@ enum sdei_conduit_types {
 };
 
 #include 
+
+#ifdef CONFIG_ARM_SDE_INTERFACE
 #include 
+#endif
 
 /* Arch code should override this to set the entry point from firmware... */
 #ifndef sdei_arch_get_entry_point
-- 
2.15.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 08/11] firmware: arm_sdei: Add ACPI GHES registration helper

APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.

GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries.

Add a helper to ask firmware for this information so it can initialise
the struct ghes and register then enable the event.

Signed-off-by: James Morse 
---
 arch/arm64/include/asm/fixmap.h |  4 +++
 drivers/firmware/arm_sdei.c | 75 +
 include/linux/arm_sdei.h|  5 +++
 3 files changed, 84 insertions(+)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index c3974517c2cb..e2b423a5feaf 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -58,6 +58,10 @@ enum fixed_addresses {
 #ifdef CONFIG_ACPI_APEI_SEA
FIX_APEI_GHES_SEA,
 #endif
+#ifdef CONFIG_ARM_SDE_INTERFACE
+   FIX_APEI_GHES_SDEI_NORMAL,
+   FIX_APEI_GHES_SDEI_CRITICAL,
+#endif
 #endif /* CONFIG_ACPI_APEI_GHES */
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 1ea71640fdc2..9b6e140cf6cb 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -2,6 +2,7 @@
 // Copyright (C) 2017 Arm Ltd.
 #define pr_fmt(fmt) "sdei: " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -887,6 +888,80 @@ static void sdei_smccc_hvc(unsigned long function_id,
arm_smccc_hvc(function_id, arg0, arg1, arg2, arg3, arg4, 0, 0, res);
 }
 
+#ifdef CONFIG_ACPI
+/* These stop private notifications using the fixmap entries simultaneously */
+static DEFINE_RAW_SPINLOCK(sdei_ghes_fixmap_lock_normal);
+static DEFINE_RAW_SPINLOCK(sdei_ghes_fixmap_lock_critical);
+
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb)
+{
+   int err;
+   u32 event_num;
+   u64 result;
+
+   if (acpi_disabled)
+   return -EOPNOTSUPP;
+
+   event_num = ghes->generic->notify.vector;
+   if (event_num == 0) {
+   /*
+* Event 0 is the reserved by the specification for
+* SDEI_EVENT_SIGNAL.
+*/
+   return -EINVAL;
+   }
+
+   err = sdei_api_event_get_info(event_num, SDEI_EVENT_INFO_EV_PRIORITY,
+ );
+   if (err)
+   return err;
+
+   if (result == SDEI_EVENT_PRIORITY_CRITICAL) {
+   ghes->nmi_fixmap_lock = _ghes_fixmap_lock_critical;
+   ghes->fixmap_idx = FIX_APEI_GHES_SDEI_CRITICAL;
+   } else {
+   ghes->nmi_fixmap_lock = _ghes_fixmap_lock_normal;
+   ghes->fixmap_idx = FIX_APEI_GHES_SDEI_NORMAL;
+   }
+
+   err = sdei_event_register(event_num, cb, ghes);
+   if (!err)
+   err = sdei_event_enable(event_num);
+
+   return err;
+}
+
+int sdei_unregister_ghes(struct ghes *ghes)
+{
+   int i;
+   int err;
+   u32 event_num = ghes->generic->notify.vector;
+
+   might_sleep();
+
+   if (acpi_disabled)
+   return -EOPNOTSUPP;
+
+   /*
+* The event may be running on another CPU. Disable it
+* to stop new events, then try to unregister a few times.
+*/
+   err = sdei_event_disable(event_num);
+   if (err)
+   return err;
+
+   for (i = 0; i < 3; i++) {
+   err = sdei_event_unregister(event_num);
+   if (err != -EINPROGRESS)
+   break;
+
+   schedule();
+   }
+
+   return err;
+}
+#endif /* CONFIG_ACPI */
+
 static int sdei_get_conduit(struct platform_device *pdev)
 {
const char *method;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 942afbd544b7..5fdf799be026 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -11,6 +11,7 @@ enum sdei_conduit_types {
CONDUIT_HVC,
 };
 
+#include 
 #include 
 
 /* Arch code should override this to set the entry point from firmware... */
@@ -39,6 +40,10 @@ int sdei_event_unregister(u32 event_num);
 int sdei_event_enable(u32 event_num);
 int sdei_event_disable(u32 event_num);
 
+/* GHES register/unregister helpers */
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb);
+int sdei_unregister_ghes(struct ghes *ghes);
+
 #ifdef CONFIG_ARM_SDE_INTERFACE
 /* For use by arch code when CPU hotplug notifiers are not appropriate. */
 int sdei_mask_local_cpu(void);
-- 
2.15.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 07/11] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications

Now that ghes uses the fixmap addresses and locks via some indirection
we can support multiple NMI-like notifications on arm64.

These should be named after their notification method. x86's
NOTIFY_NMI already is, change the SEA fixmap entry to be called
FIX_APEI_GHES_SEA.

Future patches can add support for FIX_APEI_GHES_SEI and
FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.

Signed-off-by: James Morse 
CC: Tyler Baicar 
---
 arch/arm64/include/asm/fixmap.h |  4 +++-
 drivers/acpi/apei/ghes.c| 11 ++-
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index ec1e6d6fa14c..c3974517c2cb 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -55,7 +55,9 @@ enum fixed_addresses {
 #ifdef CONFIG_ACPI_APEI_GHES
/* Used for GHES mapping from assorted contexts */
FIX_APEI_GHES_IRQ,
-   FIX_APEI_GHES_NMI,
+#ifdef CONFIG_ACPI_APEI_SEA
+   FIX_APEI_GHES_SEA,
+#endif
 #endif /* CONFIG_ACPI_APEI_GHES */
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 70ccb309a9d8..280c0a58c700 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -119,7 +119,6 @@ static DEFINE_MUTEX(ghes_list_mutex);
  * handler, but general ioremap can not be used in atomic context, so
  * the fixmap is used instead.
  */
-static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
 static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
 
 static struct gen_pool *ghes_estatus_pool;
@@ -954,6 +953,7 @@ static struct notifier_block ghes_notifier_hed = {
 
 #ifdef CONFIG_ACPI_APEI_SEA
 static LIST_HEAD(ghes_sea);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_sea);
 
 /*
  * Return 0 only if one of the SEA error sources successfully reported an error
@@ -966,8 +966,8 @@ int ghes_notify_sea(void)
 
 static void ghes_sea_add(struct ghes *ghes)
 {
-   ghes->nmi_fixmap_lock = _fixmap_lock_nmi;
-   ghes->fixmap_idx = FIX_APEI_GHES_NMI;
+   ghes->nmi_fixmap_lock = _fixmap_lock_sea;
+   ghes->fixmap_idx = FIX_APEI_GHES_SEA;
ghes_estatus_queue_grow_pool(ghes);
 
mutex_lock(_list_mutex);
@@ -991,12 +991,13 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
- * NMI may be triggered on any CPU, so ghes_in_nmi is used for
- * having only one concurrent reader.
+ * NOTIFY_NMI may be triggered on any CPU, so ghes_in_nmi is
+ * used for having only one concurrent reader.
  */
 static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-- 
2.15.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 06/11] ACPI / APEI: Make the fixmap_idx per-ghes to allow multiple in_nmi() users

Arm64 has multiple NMI-like notifications, but GHES only has one
in_nmi() path. The interactions between these multiple NMI-like
notifications is, unclear.

Split this single path up by moving the fixmap idx and lock into
the struct ghes. Each notification's init function can consider
which other notifications it masks and can share a fixmap_idx with.
This lets us merge the two ghes_ioremap_pfn_* flavours.

Two lock pointers are provided, but only one will be used by
ghes_copy_tofrom_phys(), depending on in_nmi(). This means any
notification that might arrive as an NMI must always be wrapped in
nmi_enter()/nmi_exit().

The double-underscore version of fix_to_virt() is used because
the index to be mapped can't be tested against the end of the
enum at compile time.

Signed-off-by: James Morse 
---
 drivers/acpi/apei/ghes.c | 79 ++--
 include/acpi/ghes.h  |  5 +++
 2 files changed, 35 insertions(+), 49 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 7b2504aa23b1..70ccb309a9d8 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -118,12 +118,9 @@ static DEFINE_MUTEX(ghes_list_mutex);
  * from BIOS to Linux can be determined only in NMI, IRQ or timer
  * handler, but general ioremap can not be used in atomic context, so
  * the fixmap is used instead.
- *
- * These 2 spinlocks are used to prevent the fixmap entries from being used
- * simultaneously.
  */
-static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
-static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
+static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
 
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
@@ -133,38 +130,16 @@ static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
 
-static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
+static void __iomem *ghes_fixmap_pfn(int fixmap_idx, u64 pfn)
 {
phys_addr_t paddr;
pgprot_t prot;
 
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
-   __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
-
-   return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
-}
-
-static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
-{
-   phys_addr_t paddr;
-   pgprot_t prot;
-
-   paddr = pfn << PAGE_SHIFT;
-   prot = arch_apei_get_mem_attribute(paddr);
-   __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
-
-   return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
-}
-
-static void ghes_iounmap_nmi(void)
-{
-   clear_fixmap(FIX_APEI_GHES_NMI);
-}
+   __set_fixmap(fixmap_idx, paddr, prot);
 
-static void ghes_iounmap_irq(void)
-{
-   clear_fixmap(FIX_APEI_GHES_IRQ);
+   return (void __iomem *) __fix_to_virt(fixmap_idx);
 }
 
 static int ghes_estatus_pool_init(void)
@@ -292,8 +267,8 @@ static inline int ghes_severity(int severity)
}
 }
 
-static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
- int from_phys)
+static void ghes_copy_tofrom_phys(struct ghes *ghes, void *buffer, u64 paddr,
+ u32 len, int from_phys)
 {
void __iomem *vaddr;
unsigned long flags = 0;
@@ -303,13 +278,11 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 
paddr, u32 len,
 
while (len > 0) {
offset = paddr - (paddr & PAGE_MASK);
-   if (in_nmi) {
-   raw_spin_lock(_ioremap_lock_nmi);
-   vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT);
-   } else {
-   spin_lock_irqsave(_ioremap_lock_irq, flags);
-   vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT);
-   }
+   if (in_nmi)
+   raw_spin_lock(ghes->nmi_fixmap_lock);
+   else
+   spin_lock_irqsave(ghes->fixmap_lock, flags);
+   vaddr = ghes_fixmap_pfn(ghes->fixmap_idx, paddr >> PAGE_SHIFT);
trunk = PAGE_SIZE - offset;
trunk = min(trunk, len);
if (from_phys)
@@ -319,13 +292,11 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 
paddr, u32 len,
len -= trunk;
paddr += trunk;
buffer += trunk;
-   if (in_nmi) {
-   ghes_iounmap_nmi();
-   raw_spin_unlock(_ioremap_lock_nmi);
-   } else {
-   ghes_iounmap_irq();
-   spin_unlock_irqrestore(_ioremap_lock_irq, flags);
-   }
+   clear_fixmap(ghes->fixmap_idx);
+   if (in_nmi)
+   raw_spin_unlock(ghes->nmi_fixmap_lock);
+   else
+   spin_unlock_irqrestore(ghes->fixmap_lock, flags);
}
 }
 
@@ -347,7 +318,7 @@ static int

[PATCH 04/11] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing

To split up APEIs in_nmi() path, we need any nmi-like callers to always
be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing
out into a header file.

Currently guest synchronous external aborts are claimed as RAS
notifications by handle_guest_sea(), which is hidden in the arch codes
mm/fault.c. 32bit gets a dummy declaration in system_misc.h.

There is going to be more of this in the future if/when we support
the SError-based firmware-first notification mechanism and/or
kernel-first notifications for both synchronous external abort and
SError. Each of these will come with some Kconfig symbols and a
handful of header files.

Create a header file for all this.

This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the
declarations to kvm_ras.h as preparation for a future patch that moves
the ACPI-specific RAS code out of mm/fault.c.

Signed-off-by: James Morse 
CC: Tyler Baicar 
---
 arch/arm/include/asm/kvm_ras.h   | 14 ++
 arch/arm/include/asm/system_misc.h   |  5 -
 arch/arm64/include/asm/kvm_ras.h | 11 +++
 arch/arm64/include/asm/system_misc.h |  2 --
 arch/arm64/mm/fault.c|  2 +-
 virt/kvm/arm/mmu.c   |  4 ++--
 6 files changed, 28 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_ras.h
 create mode 100644 arch/arm64/include/asm/kvm_ras.h

diff --git a/arch/arm/include/asm/kvm_ras.h b/arch/arm/include/asm/kvm_ras.h
new file mode 100644
index ..aaff56bf338f
--- /dev/null
+++ b/arch/arm/include/asm/kvm_ras.h
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM_KVM_RAS_H__
+#define __ARM_KVM_RAS_H__
+
+#include 
+
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+   return -1;
+}
+
+#endif /* __ARM_KVM_RAS_H__ */
diff --git a/arch/arm/include/asm/system_misc.h 
b/arch/arm/include/asm/system_misc.h
index 78f6db114faf..51e5ab50b35f 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -23,11 +23,6 @@ extern void (*arm_pm_idle)(void);
 
 extern unsigned int user_debug;
 
-static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
-   return -1;
-}
-
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
new file mode 100644
index ..5f72b07b7912
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM64_KVM_RAS_H__
+#define __ARM64_KVM_RAS_H__
+
+#include 
+
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+
+#endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/include/asm/system_misc.h 
b/arch/arm64/include/asm/system_misc.h
index 07aa8e3c5630..d0beefeb6d25 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -56,8 +56,6 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, 
const char *cmd);
__show_ratelimited; \
 })
 
-int handle_guest_sea(phys_addr_t addr, unsigned int esr);
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index f76bb2c3c943..adac28ce9be3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -673,7 +673,7 @@ static const struct fault_info fault_info[] = {
{ do_bad,   SIGBUS,  BUS_FIXME, "unknown 63"
},
 };
 
-int handle_guest_sea(phys_addr_t addr, unsigned int esr)
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
 {
int ret = -ENOENT;
 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ec62d1cccab7..8ae691194170 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -27,10 +27,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-#include 
 
 #include "trace.h"
 
@@ -1535,7 +1535,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 * For RAS the host kernel may handle this abort.
 * There is no need to pass the error into the guest.
 */
-   if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
+   if (!kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
return 1;
 
if (unlikely(!is_iabt)) {
-- 
2.15.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 05/11] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface

To split up APEIs in_nmi() path, we need the nmi-like callers to always
be in_nmi(). Add a helper to do the work and claim the notification.

When KVM or the arch code takes an exception that might be a RAS
notification, it asks the APEI firmware-first code whether it wants
to claim the exception. We can then go on to see if (a future)
kernel-first mechanism wants to claim the notification, before
falling through to the existing default behaviour.

The NOTIFY_SEA code was merged before we had multiple, possibly
interacting, NMI-like notifications and the need to consider kernel
first in the future. Make the 'claiming' behaviour explicit.

As we're restructuring the APEI code to allow multiple NMI-like
notifications, any notification that might interrupt interrupts-masked
code must always be wrapped in nmi_enter()/nmi_exit(). This allows APEI
to use in_nmi() so choose between the raw/regular spinlock routines.

We mask SError over this window to prevent an asynchronous RAS error
arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()).

Signed-off-by: James Morse 
CC: Tyler Baicar 
---
Why does apei_claim_sea() take a pt_regs? This gets used later to take
APEI by the hand through NMI->IRQ context, depending on what we
interrupted.

 arch/arm64/include/asm/acpi.h  |  3 +++
 arch/arm64/include/asm/daifflags.h |  1 +
 arch/arm64/include/asm/kvm_ras.h   | 20 +++-
 arch/arm64/kernel/acpi.c   | 30 ++
 arch/arm64/mm/fault.c  | 31 +++
 5 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..256811cd4b8b 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -16,6 +16,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
@@ -94,6 +95,8 @@ void __init acpi_init_cpus(void);
 static inline void acpi_init_cpus(void) { }
 #endif /* CONFIG_ACPI */
 
+int apei_claim_sea(struct pt_regs *regs);
+
 #ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
 bool acpi_parking_protocol_valid(int cpu);
 void __init
diff --git a/arch/arm64/include/asm/daifflags.h 
b/arch/arm64/include/asm/daifflags.h
index 22e4c83de5a5..cbd753855bf3 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -20,6 +20,7 @@
 
 #define DAIF_PROCCTX   0
 #define DAIF_PROCCTX_NOIRQ PSR_I_BIT
+#define DAIF_ERRCTX(PSR_I_BIT | PSR_A_BIT)
 
 /* mask/save/unmask/restore all exceptions, including interrupts. */
 static inline void local_daif_mask(void)
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
index 5f72b07b7912..9d52bc333110 100644
--- a/arch/arm64/include/asm/kvm_ras.h
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -4,8 +4,26 @@
 #ifndef __ARM64_KVM_RAS_H__
 #define __ARM64_KVM_RAS_H__
 
+#include 
+#include 
 #include 
 
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+#include 
+
+/*
+ * Was this synchronous external abort a RAS notification?
+ * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
+ *
+ * Call with irqs unmaksed.
+ */
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+   int ret = -ENOENT;
+
+   if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
+   ret = apei_claim_sea(NULL);
+
+   return ret;
+}
 
 #endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..6a4823a3eb5e 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -33,6 +33,8 @@
 
 #ifdef CONFIG_ACPI_APEI
 # include 
+# include 
+# include 
 # include 
 #endif
 
@@ -261,4 +263,32 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
return __pgprot(PROT_NORMAL_NC);
return __pgprot(PROT_DEVICE_nGnRnE);
 }
+
+
+/*
+ * Claim Synchronous External Aborts as a firmware first notification.
+ *
+ * Used by KVM and the arch do_sea handler.
+ * @regs may be NULL when called from process context.
+ */
+int apei_claim_sea(struct pt_regs *regs)
+{
+   int err = -ENOENT;
+   unsigned long current_flags = arch_local_save_flags();
+
+   if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
+   return err;
+
+   /*
+* APEI expects an NMI-like notification to always be called
+* in NMI context.
+*/
+   local_daif_restore(DAIF_ERRCTX);
+   nmi_enter();
+   err = ghes_notify_sea();
+   nmi_exit();
+   local_daif_restore(current_flags);
+
+   return err;
+}
 #endif
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index adac28ce9be3..8cbbd9a5ec7d 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include

[PATCH 03/11] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue

Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over to
it, and start abstracting GHES's single in_nmi() path.

Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's NOTIFY_NMI.

Signed-off-by: James Morse 
CC: Tyler Baicar 
---
 drivers/acpi/apei/ghes.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d3cc5bd5b496..7b2504aa23b1 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -59,6 +59,10 @@
 
 #define GHES_PFX   "GHES: "
 
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#define WANT_NMI_ESTATUS_QUEUE 1
+#endif
+
 #define GHES_ESTATUS_MAX_SIZE  65536
 #define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
 
@@ -682,7 +686,7 @@ static void ghes_estatus_cache_add(
rcu_read_unlock();
 }
 
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+#ifdef WANT_NMI_ESTATUS_QUEUE
 /*
  * While printk() now has an in_nmi() path, the handling for CPER records
  * does not. For example, memory_failure_queue() takes spinlocks and calls
@@ -870,7 +874,7 @@ static void ghes_nmi_init_cxt(void)
 
 #else
 static inline void ghes_nmi_init_cxt(void) { }
-#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+#endif /* WANT_NMI_ESTATUS_QUEUE */
 
 static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 {
@@ -986,20 +990,13 @@ static LIST_HEAD(ghes_sea);
  */
 int ghes_notify_sea(void)
 {
-   struct ghes *ghes;
-   int ret = -ENOENT;
-
-   rcu_read_lock();
-   list_for_each_entry_rcu(ghes, _sea, list) {
-   if (!ghes_proc(ghes))
-   ret = 0;
-   }
-   rcu_read_unlock();
-   return ret;
+   return ghes_estatus_queue_notified(_sea);
 }
 
 static void ghes_sea_add(struct ghes *ghes)
 {
+   ghes_estatus_queue_grow_pool(ghes);
+
mutex_lock(_list_mutex);
list_add_rcu(>list, _sea);
mutex_unlock(_list_mutex);
@@ -1011,6 +1008,8 @@ static void ghes_sea_remove(struct ghes *ghes)
list_del_rcu(>list);
mutex_unlock(_list_mutex);
synchronize_rcu();
+
+   ghes_estatus_queue_shrink_pool(ghes);
 }
 #else /* CONFIG_ACPI_APEI_SEA */
 static inline void ghes_sea_add(struct ghes *ghes) { }
-- 
2.15.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 02/11] ACPI / APEI: Generalise the estatus queue's add/remove and notify code

To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.

Refactor the estatus queue's pool grow/shrink code and notification
routine from NOTIFY_NMI's handlers. This will allow another notification
method to use the estatus queue without duplicating this code.

This patch adds rcu_read_lock()/rcu_read_unlock() around the list
list_for_each_entry_rcu() walker. These aren't strictly necessary as
the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
critical section.

Keep the oops_begin() call for x86, arm64 doesn't have one of these,
and APEI is the only thing outside arch code calling this..

The existing ghes_estatus_pool_shrink() is folded into the new
ghes_estatus_queue_shrink_pool() as only the queue uses it.

_in_nmi_notify_one() is separate from the rcu-list walker for a later
caller that doesn't need to walk a list.

Signed-off-by: James Morse 
---
 drivers/acpi/apei/ghes.c | 103 +++
 1 file changed, 68 insertions(+), 35 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e42b587c509b..d3cc5bd5b496 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -749,6 +749,54 @@ static void __process_error(struct ghes *ghes)
 #endif
 }
 
+static int _in_nmi_notify_one(struct ghes *ghes)
+{
+   int sev;
+   int ret = -ENOENT;
+
+   if (ghes_read_estatus(ghes, 1)) {
+   ghes_clear_estatus(ghes);
+   return ret;
+   } else {
+   ret = 0;
+   }
+
+   sev = ghes_severity(ghes->estatus->error_severity);
+   if (sev >= GHES_SEV_PANIC) {
+#ifdef CONFIG_X86
+   oops_begin();
+#endif
+   ghes_print_queued_estatus();
+   __ghes_panic(ghes);
+   }
+
+   if (!(ghes->flags & GHES_TO_CLEAR))
+   return ret;
+
+   __process_error(ghes);
+   ghes_clear_estatus(ghes);
+
+   return ret;
+}
+
+static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+{
+   int ret = -ENOENT;
+   struct ghes *ghes;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(ghes, rcu_list, list) {
+   if (!_in_nmi_notify_one(ghes))
+   ret = 0;
+   }
+   rcu_read_unlock();
+
+   if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && ret == 0)
+   irq_work_queue(_proc_irq_work);
+
+   return ret;
+}
+
 static unsigned long ghes_esource_prealloc_size(
const struct acpi_hest_generic *generic)
 {
@@ -764,11 +812,24 @@ static unsigned long ghes_esource_prealloc_size(
return prealloc_size;
 }
 
-static void ghes_estatus_pool_shrink(unsigned long len)
+/* After removing a queue user, we can shrink to pool */
+static void ghes_estatus_queue_shrink_pool(struct ghes *ghes)
 {
+   unsigned long len;
+
+   len = ghes_esource_prealloc_size(ghes->generic);
ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
 }
 
+/* Before adding a queue user, grow the pool */
+static void ghes_estatus_queue_grow_pool(struct ghes *ghes)
+{
+   unsigned long len;
+
+   len = ghes_esource_prealloc_size(ghes->generic);
+   ghes_estatus_pool_expand(len);
+}
+
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
struct llist_node *llnode, *next;
@@ -967,48 +1028,22 @@ static LIST_HEAD(ghes_nmi);
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-   struct ghes *ghes;
-   int sev, ret = NMI_DONE;
+   int ret = NMI_DONE;
 
if (!atomic_add_unless(_in_nmi, 1, 1))
return ret;
 
-   list_for_each_entry_rcu(ghes, _nmi, list) {
-   if (ghes_read_estatus(ghes, 1)) {
-   ghes_clear_estatus(ghes);
-   continue;
-   } else {
-   ret = NMI_HANDLED;
-   }
-
-   sev = ghes_severity(ghes->estatus->error_severity);
-   if (sev >= GHES_SEV_PANIC) {
-   oops_begin();
-   ghes_print_queued_estatus();
-   __ghes_panic(ghes);
-   }
-
-   if (!(ghes->flags & GHES_TO_CLEAR))
-   continue;
-
-   __process_error(ghes);
-   ghes_clear_estatus(ghes);
-   }
+   if (!ghes_estatus_queue_notified(_nmi))
+   ret = NMI_HANDLED;
 
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-   if (ret == NMI_HANDLED)
-   irq_work_queue(_proc_irq_work);
-#endif
atomic_dec(_in_nmi);
return ret;
 }
 
 static void ghes_nmi_add(struct ghes *ghes)
 {
-   unsigned long len;
+   ghes_estatus_queue_grow_pool(ghes);
 
-   len = ghes_esource_prealloc_size(ghes->generic);
-   ghes_estatus_pool_expand(len);
mutex_lock(_list_mutex);
if (list_empty(_nmi))

[PATCH 01/11] ACPI / APEI: Move the estatus queue code up, and under its own ifdef

To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.

First we move the estatus-queue code higher in the file so that any
notify_foo() handler can make user of it.

This patch moves code around ... and makes the following trivial change:
Freshen the dated comment above ghes_estatus_llist. printk() is no
longer the issue, its the helpers like memory_failure_queue() that
still aren't nmi safe.

Signed-off-by: James Morse 
---
 drivers/acpi/apei/ghes.c | 267 ---
 1 file changed, 139 insertions(+), 128 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1efefe919555..e42b587c509b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -545,6 +545,16 @@ static int ghes_print_estatus(const char *pfx,
return 0;
 }
 
+static void __ghes_panic(struct ghes *ghes)
+{
+   __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
+
+   /* reboot to log the error! */
+   if (!panic_timeout)
+   panic_timeout = ghes_panic_timeout;
+   panic("Fatal hardware error!");
+}
+
 /*
  * GHES error status reporting throttle, to report more kinds of
  * errors, instead of just most frequently occurred errors.
@@ -672,6 +682,135 @@ static void ghes_estatus_cache_add(
rcu_read_unlock();
 }
 
+#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+/*
+ * While printk() now has an in_nmi() path, the handling for CPER records
+ * does not. For example, memory_failure_queue() takes spinlocks and calls
+ * schedule_work_on().
+ *
+ * So in any NMI-like handler, we allocate required memory from lock-less
+ * memory allocator (ghes_estatus_pool), save estatus into it, put them into
+ * lock-less list (ghes_estatus_llist), then delay printk into IRQ context via
+ * irq_work (ghes_proc_irq_work).  ghes_estatus_size_request record
+ * required pool size by all NMI error source.
+ *
+ * Memory from the ghes_estatus_pool is also used with the ghes_estatus_cache
+ * to suppress frequent messages.
+ */
+static struct llist_head ghes_estatus_llist;
+static struct irq_work ghes_proc_irq_work;
+
+static void ghes_print_queued_estatus(void)
+{
+   struct llist_node *llnode;
+   struct ghes_estatus_node *estatus_node;
+   struct acpi_hest_generic *generic;
+   struct acpi_hest_generic_status *estatus;
+
+   llnode = llist_del_all(_estatus_llist);
+   /*
+* Because the time order of estatus in list is reversed,
+* revert it back to proper order.
+*/
+   llnode = llist_reverse_order(llnode);
+   while (llnode) {
+   estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+  llnode);
+   estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+   generic = estatus_node->generic;
+   ghes_print_estatus(NULL, generic, estatus);
+   llnode = llnode->next;
+   }
+}
+
+/* Save estatus for further processing in IRQ context */
+static void __process_error(struct ghes *ghes)
+{
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+   u32 len, node_len;
+   struct ghes_estatus_node *estatus_node;
+   struct acpi_hest_generic_status *estatus;
+
+   if (ghes_estatus_cached(ghes->estatus))
+   return;
+
+   len = cper_estatus_len(ghes->estatus);
+   node_len = GHES_ESTATUS_NODE_LEN(len);
+
+   estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
+   if (!estatus_node)
+   return;
+
+   estatus_node->ghes = ghes;
+   estatus_node->generic = ghes->generic;
+   estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+   memcpy(estatus, ghes->estatus, len);
+   llist_add(_node->llnode, _estatus_llist);
+#endif
+}
+
+static unsigned long ghes_esource_prealloc_size(
+   const struct acpi_hest_generic *generic)
+{
+   unsigned long block_length, prealloc_records, prealloc_size;
+
+   block_length = min_t(unsigned long, generic->error_block_length,
+GHES_ESTATUS_MAX_SIZE);
+   prealloc_records = max_t(unsigned long,
+generic->records_to_preallocate, 1);
+   prealloc_size = min_t(unsigned long, block_length * prealloc_records,
+ GHES_ESOURCE_PREALLOC_MAX_SIZE);
+
+   return prealloc_size;
+}
+
+static void ghes_estatus_pool_shrink(unsigned long len)
+{
+   ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
+}
+
+static void ghes_proc_in_irq(struct irq_work *irq_work)
+{
+   struct llist_node *llnode, *next;
+   struct ghes_estatus_node *estatus_node;
+   struct acpi_hest_generic *generic;
+   struct acpi_hest_generic_status *estatus;
+   u32 len, node_len;
+
+   llnode = llist_del_all(_estatus_llist);
+   /*
+* Because the time order of estatus

[PATCH 00/11] APEI in_nmi() rework and arm64 SDEI wire-up

Hello!

The aim of this series is to wire arm64's SDEI into APEI.

What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
used by firmware to tell the OS about firmware-first RAS events.

These Software exceptions can interrupt anything, so I describe them as
NMI-like. They aren't the only NMI-like way to notify the OS about
firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
'NOTIFY_SEI'.

(Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
but the owner of that memory said no. These are always synchronous with the
instruction that caused them. SEI, System-Error Interrupt, commonly called
SError. This is an asynchronous external abort, the memory-owner didn't say no
at the right point. Collectively these things are called external-aborts
How is firmware involved? It traps these and re-injects them into the kernel
once its written the CPER records).

APEI's GHES code only expects one source of NMI. If a platform implements
more than one of these mechanisms, APEI needs to handle the interaction.
'SEA' and 'SEI' can interact as 'SEI' is asynchronous. SDEI can interact
with itself: its exceptions can be 'normal' or 'critical', and firmware
could use both types for RAS. (errors using normal, 'panic-now' using
critical).

What does this series do?
Patches 1-3 refactor APEIs 'estatus queue' so it can be used for all
NMI-like notifications. This defers the NMI work to irq_work, which will
happen when we next unmask interrupts.

Patches 4&5 move the arch and KVM code around so that NMI-like notifications
are always called in_nmi().

Patch 6 splits the 'irq or nmi?' path through ghes_copy_tofrom_phys()
up to be per-ghes. This lets each ghes specify which other error-sources
it can share a fixmap-slot and lock with.

Patch 7 renames NOTIFY_SEA's use of NOTIFY_NMI's infrastructure, as we're
about to have multiple NMI-like users that can't share resources.

Pathes 8&9 add the SDEI helper, and notify methods for APEI.

After this, adding the last firmware-first piece for arm64 is simple
(and safe), and all our NMI-like notifications behave the same as x86's
NOTIFY_NMI.

All of this makes the race between memory_failure_queue() and
ret_to_user worse, as there is now always irq_work involved.

Patch 10 makes the reschedule to memory_failure() run as soon as possible.
Patch 11 makes sure the arch code knows whether the irq_work has run by
the time do_sea() returns. We can skip the signalling step if it has as
APEI has done its work.

ghes.c became clearer to me when I worked out that it has three sets of
functions with 'estatus' in the name. One is a pool of memory that can be
allocated-from atomically. This is grown/shrunk when new NMI users are
allocated.
The second is the estatus-cache, which holds recent notifications so it
can suppress notifications we've already handled.
The last it the estatus-queue, which holds data from NMI-like notifications
(in pool memory) to be processed from irq_work.

Testing?
Tested with the SDEI FVP based software model and a mocked up NOTFIY_SEA using
KVM. I've only build tested this on x86.

Trees... The changes outside APEI are tiny, but there will be some changes
to how arch/arm64/mm/fault.c generates signals, affecting do_sea() that will
cause conflicts with patch 5.

Thanks,

James

[0]
http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf

James Morse (11):
ACPI / APEI: Move the estatus queue code up, and under its own ifdef
ACPI / APEI: Generalise the estatus queue's add/remove and notify code
ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
ACPI / APEI: Make the fixmap_idx per-ghes to allow multiple in_nmi()
users
ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
firmware: arm_sdei: Add ACPI GHES registration helper
ACPI / APEI: Add support for the SDEI GHES Notification type
mm/memory-failure: increase queued recovery work's priority
arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work

[REPOST PATCH] arm/arm64: KVM: Add PSCI version selection API

Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
or 1.0 to a guest, defaulting to the latest version of the PSCI
implementation that is compatible with the requested version. This is
no different from doing a firmware upgrade on KVM.

But in order to give a chance to hypothetical badly implemented guests
that would have a fit by discovering something other than PSCI 0.2,
let's provide a new API that allows userspace to pick one particular
version of the API.

This is implemented as a new class of "firmware" registers, where
we expose the PSCI version. This allows the PSCI version to be
save/restored as part of a guest migration, and also set to
any supported version if the guest requires it.

Signed-off-by: Marc Zyngier 
---
This is a repost of this proposal for a firmware selection API, as
there was some discussion during the merging window. Now that the core
variant-2 stuff is merged, we can have a go at more
bikesheding. Please open fire!

 Documentation/virtual/kvm/api.txt  |  9 -
 Documentation/virtual/kvm/arm/psci.txt | 30 +
 arch/arm/include/asm/kvm_host.h|  3 ++
 arch/arm/include/uapi/asm/kvm.h|  6 
 arch/arm/kvm/guest.c   | 13 
 arch/arm64/include/asm/kvm_host.h  |  3 ++
 arch/arm64/include/uapi/asm/kvm.h  |  6 
 arch/arm64/kvm/guest.c | 14 +++-
 include/kvm/arm_psci.h | 16 +++--
 virt/kvm/arm/psci.c| 60 ++
 10 files changed, 156 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/virtual/kvm/arm/psci.txt

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 57d3ee9e4bde..d2b41e1608b4 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1943,6 +1943,9 @@ ARM 32-bit VFP control registers have the following id 
bit patterns:
 ARM 64-bit FP registers have the following id bit patterns:
   0x4030  0012 0 
 
+ARM firmware pseudo-registers have the following bit pattern:
+  0x4030  0014 
+
 
 arm64 registers are mapped using the lower 32 bits. The upper 16 of
 that is the register group type, or coprocessor number:
@@ -1959,6 +1962,9 @@ arm64 CCSIDR registers are demultiplexed by CSSELR value:
 arm64 system registers have the following id bit patterns:
   0x6030  0013 
 
+arm64 firmware pseudo-registers have the following bit pattern:
+  0x6030  0014 
+
 
 MIPS registers are mapped using the lower 32 bits.  The upper 16 of that is
 the register group type:
@@ -2493,7 +2499,8 @@ Possible features:
  and execute guest code when KVM_RUN is called.
- KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
  Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
-   - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU.
+   - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision
+  backward compatible with v0.2) for the CPU.
  Depends on KVM_CAP_ARM_PSCI_0_2.
- KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
  Depends on KVM_CAP_ARM_PMU_V3.
diff --git a/Documentation/virtual/kvm/arm/psci.txt 
b/Documentation/virtual/kvm/arm/psci.txt
new file mode 100644
index ..aafdab887b04
--- /dev/null
+++ b/Documentation/virtual/kvm/arm/psci.txt
@@ -0,0 +1,30 @@
+KVM implements the PSCI (Power State Coordination Interface)
+specification in order to provide services such as CPU on/off, reset
+and power-off to the guest.
+
+The PSCI specification is regularly updated to provide new features,
+and KVM implements these updates if they make sense from a virtualization
+point of view.
+
+This means that a guest booted on two different versions of KVM can
+observe two different "firmware" revisions. This could cause issues if
+a given guest is tied to a particular PSCI revision (unlikely), or if
+a migration causes a different PSCI version to be exposed out of the
+blue to an unsuspecting guest.
+
+In order to remedy this situation, KVM exposes a set of "firmware
+pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
+interface. These registers can be saved/restored by userspace, and set
+to a convenient value if required.
+
+The following register is defined:
+
+* KVM_REG_ARM_PSCI_VERSION:
+
+  - Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
+(and thus has already been initialized)
+  - Returns the current PSCI version on GET_ONE_REG (defaulting to the
+highest PSCI version implemented by KVM and compatible with v0.2)
+  - Allows any PSCI version implemented by KVM and compatible with
+v0.2 to be set with SET_ONE_REG
+  - Affects the whole VM (even if the register view is per-vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ef54013b5b9f..6c05e3b13081 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -75,6 +75,9

Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

Hi gengdongjiu, liu jun

On 05/02/18 11:24, gengdongjiu wrote:
> James Morse wrote:
>> I'd like to pick these patches onto the end of that series, but first I want 
>> to
>> know what NOTIFY_SEI means for any OS. The ACPI spec doesn't say, and
>> because its asynchronous, route-able and mask-able, there are many more
>> corners than NOTFIY_SEA.
>>
>> This thing is a notification using an emulated SError exception. (emulated
>> because physical-SError must be routed to EL3 for firmware-first, and
>> virtual-SError belongs to EL2).
>>
>> Does your firmware emulate SError exactly as the TakeException() pseudo code
>> in the Arm-Arm?
> 
> Yes, it is.
> 
>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
>> TGE}?
> 
> Yes, it is.

... and yet ...

>> What does your firmware do when it wants to emulate SError but its masked?
>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>> PSTATE.A  set.
>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>> emulated  SError should go to EL1. This effectively masks SError.)
> 
> Currently we does not consider much about the mask status(SPSR).

.. this is a problem.

If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret to
EL2. This should never happen, SError is effectively masked if you are running
at an EL higher than the one its routed to.

More obviously: if the exception came from the EL that SError should be routed
to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only
way the OS has to indicate it can't take an exception right now. VBAR_EL1 may be
'wrong' if we're doing some power-management, the FAR/ELR/ESR/SPSR registers may
contain live values that the OS would lose if you deliver another exception over
the top.

If you deliver an emulated-SError as the OS eret's, your new ELR will point at
the eret instruction and the CPU will spin on this instruction forever.

You have to honour the masking and routing rules for SError, otherwise no OS can
run safely with this firmware.

> I remember that you ever suggested firmware should reboot if the mask status
> is set(SPSR), right?

Yes, this is my suggestion of what to do if you can't deliver an SError: store
the RAS error in the BERT and 'reboot'.

> I CC "liu jun"  who is our UEFI firmware Architect,
> if you have firmware requirements, you can raise again.

(UEFI? I didn't think there was any of that at EL3, but I'm not familiar with
all the 'PI' bits).

The requirement is your emulated-SError from EL3 looks exactly like a
physical-SError as if EL3 wasn't implemented.
Your CPU has to handle cases where it can't deliver an SError, your emulation
has to do the same.

This is not something any OS can work around.

>> Answers to these let us determine whether a bug is in the firmware or the
>> kernel. If firmware is expecting the OS to do something special, I'd like to 
>> know
>> about it from the beginning!
> 
> I know your meaning, thanks for raising it again.

Happy new year,

James
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

Hi gengdongjiu,

On 12/02/18 10:19, gengdongjiu wrote:
> On 2018/2/10 1:44, James Morse wrote:
>> The point? We can't know what a CPU without the RAS extensions puts in there.
>>
>> Why Does this matter? When migrating a pending SError we have to know the
>> difference between 'use this 64bit value', and 'the CPU will generate it'.
>> If I make an SError pending with ESR=0 on a CPU with VSESR, I can't migrated 
>> to
>> a system that generates an impdef SError-ESR, because I can't know it will 
>> be 0.

> For the target system, before taking the SError, no one can know whether its 
> syndrome value
> is IMPLEMENTATION DEFINED or architecturally defined.

For a virtual-SError, the hypervisor knows what it generated. (do I have
VSESR_EL2? What did I put in there?).

> when the virtual SError is taken, the ESR_ELx.IDS will be updated, then we 
> can know
> whether the ESR value is impdef or architecturally defined.

True, the guest can't know anything about a pending virtual SError until it
takes it. Why is this a problem?

> It seems migration is only allowed only when target system and source system 
> all support
> RAS extension, because we do not know whether its syndrome is IMPLEMENTATION 
> DEFINED or
> architecturally defined.

I don't think Qemu allows migration between hosts with differing guest-ID
registers. But we shouldn't depend on this, and we may want to hide the v8.2 RAS
features from the guest's ID register, but still use them from the host.

The way I imagined it working was we would pack the following information into
that events struct:
{
bool serror_pending;
bool serror_has_esr;
u64  serror_esr;
}

The problem I was trying to describe is because there is no value of serror_esr
we can use to mean 'Ignore this, I'm a v8.0 CPU'. VSESR_EL2 is a 64bit register,
any bits we abuse may get a meaning we want to use in the future.

When it comes to migration, v8.{0,1} systems can only GET/SET events where
serror_has_esr == false, they can't use the serror_esr. On v8.2 systems we
should require serror_has_esr to be true.

If we need to support migration from v8.{0,1} to v8.2, we can make up an impdef
serror_esr.

We will need to decide what KVM does when SET is called but an SError was
already pending. 2.5.3 "Multiple SError interrupts" of [0] has something to say.

Happy new year,

James

[0]
https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 17/19] arm64: KVM: Dynamically compute the HYP VA mask

On 18/01/18 20:28, Christoffer Dall wrote:
> On Thu, Jan 04, 2018 at 06:43:32PM +, Marc Zyngier wrote:
>> As we're moving towards a much more dynamic way to compute our
>> HYP VA, let's express the mask in a slightly different way.
>>
>> Instead of comparing the idmap position to the "low" VA mask,
>> we directly compute the mask by taking into account the idmap's
>> (VA_BIT-1) bit.
>>
>> No functionnal change.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/va_layout.c | 17 ++---
>>  1 file changed, 6 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
>> index aee758574e61..75bb1c6772b0 100644
>> --- a/arch/arm64/kvm/va_layout.c
>> +++ b/arch/arm64/kvm/va_layout.c
>> @@ -21,24 +21,19 @@
>>  #include 
>>  #include 
>>  
>> -#define HYP_PAGE_OFFSET_HIGH_MASK   ((UL(1) << VA_BITS) - 1)
>> -#define HYP_PAGE_OFFSET_LOW_MASK((UL(1) << (VA_BITS - 1)) - 1)
>> -
>>  static u64 va_mask;
>>  
>>  static void compute_layout(void)
>>  {
>>  phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
>> -unsigned long mask = HYP_PAGE_OFFSET_HIGH_MASK;
>> +u64 region;
> 
> the naming here really confused me.  Would it make sense to call this
> 'hyp_va_msb' or something like that instead?
> 
>>  
>> -/*
>> - * Activate the lower HYP offset only if the idmap doesn't
>> - * clash with it,
>> - */
>> -if (idmap_addr > HYP_PAGE_OFFSET_LOW_MASK)
>> -mask = HYP_PAGE_OFFSET_HIGH_MASK;
> 
> Ah, the series was tested, it was just that this code only existed for a
> short while.  Amusingly, I think this ephemeral bug goes against the "No
> function change" statement in the commit message.
> 
>> +/* Where is my RAM region? */
>> +region  = idmap_addr & BIT(VA_BITS - 1);
>> +region ^= BIT(VA_BITS - 1);
>>  
>> -va_mask = mask;
>> +va_mask  = BIT(VA_BITS - 1) - 1;
> 
> nit: This could also be written as GENMASK_ULL(VA_BITS - 2, 0) --- and
> now I'm not sure which one I prefer.

Good point. I think GENMASK makes it clearer what the intent is, and
assigning a mask to a mask has certain degree of consistency (/me fondly
remembers dimensional analysis...).

> 
>> +va_mask |= region;
>>  }
>>  
>>  static u32 compute_instruction(int n, u32 rd, u32 rn)
>> -- 
>> 2.14.2
>>
> Otherwise looks good:
> 
> Reviewed-by: Christoffer Dall 
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 14/19] KVM: arm/arm64: Move HYP IO VAs to the "idmap" range

On 18/01/18 14:39, Christoffer Dall wrote:
> On Thu, Jan 04, 2018 at 06:43:29PM +, Marc Zyngier wrote:
>> We so far mapped our HYP IO (which is essencially the GICv2 control
>> registers) using the same method as for memory. It recently appeared
>> that is a bit unsafe:
>>
>> We compute the HYP VA using the kern_hyp_va helper, but that helper
>> is only designed to deal with kernel VAs coming from the linear map,
>> and not from the vmalloc region... This could in turn cause some bad
>> aliasing between the two, amplified by the upcoming VA randomisation.
>>
>> A solution is to come up with our very own basic VA allocator for
>> MMIO. Since half of the HYP address space only contains a single
>> page (the idmap), we have plenty to borrow from. Let's use the idmap
>> as a base, and allocate downwards from it. GICv2 now lives on the
>> other side of the great VA barrier.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  virt/kvm/arm/mmu.c | 56 
>> +-
>>  1 file changed, 43 insertions(+), 13 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 6192d45d1e1a..14c5e5534f2f 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -43,6 +43,9 @@ static unsigned long hyp_idmap_start;
>>  static unsigned long hyp_idmap_end;
>>  static phys_addr_t hyp_idmap_vector;
>>  
>> +static DEFINE_MUTEX(io_map_lock);
>> +static unsigned long io_map_base;
>> +
>>  #define S2_PGD_SIZE (PTRS_PER_S2_PGD * sizeof(pgd_t))
>>  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>>  
>> @@ -502,27 +505,31 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t 
>> start, u64 size)
>>   *
>>   * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
>>   * therefore contains either mappings in the kernel memory area (above
>> - * PAGE_OFFSET), or device mappings in the vmalloc range (from
>> - * VMALLOC_START to VMALLOC_END).
>> + * PAGE_OFFSET), or device mappings in the idmap range.
>>   *
>> - * boot_hyp_pgd should only map two pages for the init code.
>> + * boot_hyp_pgd should only map the idmap range, and is only used in
>> + * the extended idmap case.
>>   */
>>  void free_hyp_pgds(void)
>>  {
>> +pgd_t *id_pgd;
>> +
>>  mutex_lock(_hyp_pgd_mutex);
>>  
>> +id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd;
>> +
>> +if (id_pgd)
>> +unmap_hyp_range(id_pgd, io_map_base,
>> +hyp_idmap_start + PAGE_SIZE - io_map_base);
>> +
>>  if (boot_hyp_pgd) {
>> -unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>  free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
>>  boot_hyp_pgd = NULL;
>>  }
>>  
>>  if (hyp_pgd) {
>> -unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>  unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET),
>>  (uintptr_t)high_memory - PAGE_OFFSET);
>> -unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START),
>> -VMALLOC_END - VMALLOC_START);
>>  
>>  free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
>>  hyp_pgd = NULL;
>> @@ -721,7 +728,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t 
>> size,
>> void __iomem **kaddr,
>> void __iomem **haddr)
>>  {
>> -unsigned long start, end;
>> +pgd_t *pgd = hyp_pgd;
>> +unsigned long base;
>>  int ret;
>>  
>>  *kaddr = ioremap(phys_addr, size);
>> @@ -733,17 +741,38 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, 
>> size_t size,
>>  return 0;
>>  }
>>  
>> +mutex_lock(_map_lock);
>> +
>> +base = io_map_base - size;
> 
> are we guaranteed that hyp_idmap_start (and therefore io_map_base) is
> sufficiently greater than 0 ?  I suppose that even if RAM starts at 0,
> and the kernel was loaded at 0, the idmap page for Hyp would be at some
> reasonable offset from the start of the kernel image?

On my kernel image:
0808 t _head
08cc6000 T __hyp_idmap_text_start
09aaa000 B swapper_pg_end

_hyp_idmap_text_start is about 12MB from the beginning of the image, and
about 14MB from the end. Yes, it is a big kernel. But we're only mapping
a few pages there, even with my upcoming crazy vector remapping crap. So
the likeliness of this failing is close to zero.

Now, close to zero is not necessarily close enough. What I could do is
to switch the allocator around on failure, so that if we can't allocate
on one side, we can at least try to allocate on the other side. I'm
pretty sure we'll never trigger that code, but I can implement it if you
think that's worth it.

> 
>> +base &= ~(size - 1);
> 
> I'm not sure I understand this line.  Wouldn't it make more sense to use
> PAGE_SIZE here?

This is trying to align the base of the allocation to its natural size
(8kB aligned on an 8kB boundary, for example),

Re: [PATCH v4 08/19] arm64: KVM: Dynamically patch the kernel/hyp VA mask