[PATCH v2 17/17] KVM: arm64: Add async PF document

2021-02-08 Thread Gavin Shan
This adds document to explain the interface for asynchronous page
fault and how it works in general.

Signed-off-by: Gavin Shan 
---
 Documentation/virt/kvm/arm/apf.rst   | 143 +++
 Documentation/virt/kvm/arm/index.rst |   1 +
 2 files changed, 144 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/apf.rst

diff --git a/Documentation/virt/kvm/arm/apf.rst 
b/Documentation/virt/kvm/arm/apf.rst
new file mode 100644
index ..4f5c01b6699f
--- /dev/null
+++ b/Documentation/virt/kvm/arm/apf.rst
@@ -0,0 +1,143 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Asynchronous Page Fault Support for arm64
+=
+
+There are two stages of page faults when KVM module is enabled as accelerator
+to the guest. The guest is responsible for handling the stage-1 page faults,
+while the host handles the stage-2 page faults. During the period of handling
+the stage-2 page faults, the guest is suspended until the requested page is
+ready. It could take several milliseconds, even hundreds of milliseconds in
+extreme situations because I/O might be required to move the requested page
+from disk to DRAM. The guest does not do any work when it is suspended. The
+feature (Asynchronous Page Fault) is introduced to take advantage of the
+suspending period and to improve the overall performance.
+
+There are two paths in order to fulfil the asynchronous page fault, called
+as control path and data path. The control path allows the VMM or guest to
+configure the functionality, while the notifications are delivered in data
+path. The notifications are classified into page-not-present and page-ready
+notifications.
+
+Data Path
+-
+
+There are two types of notifications delivered from host to guest in the
+data path: page-not-present and page-ready notification. They are delivered
+through SDEI event and (PPI) interrupt separately. Besides, there is a shared
+buffer between host and guest to indicate the reason and sequential token,
+which is used to identify the asynchronous page fault. The reason and token
+resident in the shared buffer is written by host, read and cleared by guest.
+An asynchronous page fault is delivered and completed as below.
+
+(1) When an asynchronous page fault starts, a (workqueue) worker is created
+and queued to the vCPU's pending queue. The worker makes the requested
+page ready and resident to DRAM in the background. The shared buffer is
+updated with reason and sequential token. After that, SDEI event is sent
+to guest as page-not-present notification.
+
+(2) When the SDEI event is received on guest, the current process is tagged
+with TIF_ASYNC_PF and associated with a wait queue. The process is ready
+to keep rescheduling itself on switching from kernel to user mode. After
+that, a reschedule IPI is sent to current CPU and the received SDEI event
+is acknowledged. Note that the IPI is delivered when the acknowledgment
+on the SDEI event is received on host.
+
+(3) On the host, the worker is dequeued from the vCPU's pending queue and
+enqueued to its completion queue when the requested page becomes ready.
+In the mean while, KVM_REQ_ASYNC_PF request is sent the vCPU if the
+worker is the first element enqueued to the completion queue.
+
+(4) With pending KVM_REQ_ASYNC_PF request, the first worker in the completion
+queue is dequeued and destroyed. In the mean while, a (PPI) interrupt is
+sent to guest with updated reason and token in the shared buffer.
+
+(5) When the (PPI) interrupt is received on guest, the affected process is
+located using the token and waken up after its TIF_ASYNC_PF tag is cleared.
+After that, the interrupt is acknowledged through SMCCC interface. The
+workers in the completion queue is dequeued and destroyed if any workers
+exist, and another (PPI) interrupt is sent to the guest.
+
+Control Path
+
+
+The configurations are passed through SMCCC or ioctl interface. The SDEI
+event and (PPI) interrupt are owned by VMM, so the SDEI event and interrupt
+numbers are configured through ioctl command on per-vCPU basis. Besides,
+the functionality might be enabled and configured through ioctl interface
+by VMM during migration:
+
+   * KVM_ARM_ASYNC_PF_CMD_GET_VERSION
+
+ Returns the current version of the feature, supported by the host. It is
+ made up of major, minor and revision fields. Each field is one byte in
+ length.
+
+   * KVM_ARM_ASYNC_PF_CMD_GET_SDEI:
+
+ Retrieve the SDEI event number, used for page-not-present notification,
+ so that it can be configured on destination VM in the scenario of
+ migration.
+
+   * KVM_ARM_ASYNC_PF_GET_IRQ:
+
+ Retrieve the IRQ (PPI) number, used for page-ready notification, so that
+ it can be configured on destination VM in the scenario of migration.
+
+   * KVM_ARM_ASYNC_PF_CMD_GET_CONTROL
+
+ Retrieve the address of control block, so that it can 

[PATCH v2 16/17] arm64: Enable async PF

2021-02-08 Thread Gavin Shan
This enables asynchronous page fault from guest side. The design
is highlighted as below:

   * The per-vCPU shared memory region, which is represented by
 "struct kvm_vcpu_pv_apf_data", is allocated. The reason and
 token associated with the received notifications of asynchronous
 page fault are delivered through it.

   * A per-vCPU table, which is represented by "struct kvm_apf_table",
 is allocated. The process, on which the page-not-present notification
 is received, is added into the table so that it can reschedule
 itself on switching from kernel to user mode. Afterwards, the
 process, identified by token, is removed from the table and put
 into runnable state when page-ready notification is received.

   * During CPU hotplug, the (private) SDEI event is expected to be
 enabled or disabled on the affected CPU by SDEI client driver.
 The (PPI) interrupt is enabled or disabled on the affected CPU
 by ourself. When the system is going to reboot, the SDEI event
 is disabled and unregistered and the (PPI) interrupt is disabled.

   * The SDEI event and (PPI) interrupt number are retrieved from host
 through SMCCC interface. Besides, the version of the asynchronous
 page fault is validated when the feature is enabled on the guest.

   * The feature is disabled on guest when boot parameter "no-kvmapf"
 is specified.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kernel/Makefile |   1 +
 arch/arm64/kernel/kvm.c| 452 +
 2 files changed, 453 insertions(+)
 create mode 100644 arch/arm64/kernel/kvm.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 86364ab6f13f..c849ef61f043 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_ACPI)+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)  += acpi_parking_protocol.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o
+obj-$(CONFIG_KVM_GUEST)+= kvm.o
 obj-$(CONFIG_RANDOMIZE_BASE)   += kaslr.o
 obj-$(CONFIG_HIBERNATION)  += hibernate.o hibernate-asm.o
 obj-$(CONFIG_KEXEC_CORE)   += machine_kexec.o relocate_kernel.o
\
diff --git a/arch/arm64/kernel/kvm.c b/arch/arm64/kernel/kvm.c
new file mode 100644
index ..effe8dc7e921
--- /dev/null
+++ b/arch/arm64/kernel/kvm.c
@@ -0,0 +1,452 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Asynchronous page fault support.
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Author(s): Gavin Shan 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct kvm_apf_task {
+   unsigned inttoken;
+   struct task_struct  *task;
+   struct swait_queue_head wq;
+};
+
+struct kvm_apf_table {
+   raw_spinlock_t  lock;
+   unsigned intcount;
+   struct kvm_apf_task tasks[0];
+};
+
+static bool async_pf_available = true;
+static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_data) 
__aligned(64);
+static struct kvm_apf_table __percpu *apf_tables;
+static unsigned int apf_tasks;
+static unsigned int apf_sdei_num;
+static unsigned int apf_ppi_num;
+static int apf_irq;
+
+static bool kvm_async_pf_add_task(struct task_struct *task,
+ unsigned int token)
+{
+   struct kvm_apf_table *table = this_cpu_ptr(apf_tables);
+   unsigned int i, index = apf_tasks;
+   bool ret = false;
+
+   raw_spin_lock(&table->lock);
+
+   if (WARN_ON(table->count >= apf_tasks))
+   goto unlock;
+
+   for (i = 0; i < apf_tasks; i++) {
+   if (!table->tasks[i].task) {
+   if (index == apf_tasks) {
+   ret = true;
+   index = i;
+   }
+   } else if (table->tasks[i].task == task) {
+   WARN_ON(table->tasks[i].token != token);
+   ret = false;
+   break;
+   }
+   }
+
+   if (!ret)
+   goto unlock;
+
+   task->thread.data = &table->tasks[index].wq;
+   set_tsk_thread_flag(task, TIF_ASYNC_PF);
+
+   table->count++;
+   table->tasks[index].task = task;
+   table->tasks[index].token = token;
+
+unlock:
+   raw_spin_unlock(&table->lock);
+   return ret;
+}
+
+static inline void kvm_async_pf_remove_one_task(struct kvm_apf_table *table,
+   unsigned int index)
+{
+   clear_tsk_thread_flag(table->tasks[index].task, TIF_ASYNC_PF);
+   WRITE_ONCE(table->tasks[index].task->thread.data, NULL);
+
+   table->count--;
+   table->tasks[index].task = NULL;
+   table->tasks[index].token = 0;
+
+   swake_up_one(&table->tasks[in

[PATCH v2 15/17] arm64: Reschedule process on aync PF

2021-02-08 Thread Gavin Shan
The page-not-present notification is delivered by SDEI event. The
guest reschedules current process to another one when the SDEI event
is received. It's not safe to do so in the SDEI event handler because
the SDEI event should be acknowledged as soon as possible.

So the rescheduling is postponed until the current process switches
from kernel to user mode. In order to trigger the switch, the SDEI
event handler sends (reschedule) IPI to current CPU and it's delivered
in time after the SDEI event is acknowledged.

A new thread flag (TIF_ASYNC_PF) is introduced in order to track the
state for the process, to be rescheduled. With the flag is set, there
is a head of wait-queue is associated with the process. The process
keeps rescheduling itself until the flag is cleared when page-ready
notification is received through (PPI) interrupt.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/processor.h   |  1 +
 arch/arm64/include/asm/thread_info.h |  4 +++-
 arch/arm64/kernel/signal.c   | 17 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index ca2cd75d3286..2176c88c77a7 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -154,6 +154,7 @@ struct thread_struct {
u64 sctlr_tcf0;
u64 gcr_user_excl;
 #endif
+   void*data;
 };
 
 static inline void arch_thread_struct_whitelist(unsigned long *offset,
diff --git a/arch/arm64/include/asm/thread_info.h 
b/arch/arm64/include/asm/thread_info.h
index 9f4e3b266f21..939beb3c7723 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -65,6 +65,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_MTE_ASYNC_FAULT5   /* MTE Asynchronous Tag Check Fault */
 #define TIF_NOTIFY_SIGNAL  6   /* signal notifications exist */
+#define TIF_ASYNC_PF   7   /* Asynchronous page fault */
 #define TIF_SYSCALL_TRACE  8   /* syscall trace active */
 #define TIF_SYSCALL_AUDIT  9   /* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT 10  /* syscall tracepoint for ftrace */
@@ -95,11 +96,12 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define _TIF_SVE   (1 << TIF_SVE)
 #define _TIF_MTE_ASYNC_FAULT   (1 << TIF_MTE_ASYNC_FAULT)
 #define _TIF_NOTIFY_SIGNAL (1 << TIF_NOTIFY_SIGNAL)
+#define _TIF_ASYNC_PF  (1 << TIF_ASYNC_PF)
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
 _TIF_UPROBE | _TIF_MTE_ASYNC_FAULT | \
-_TIF_NOTIFY_SIGNAL)
+_TIF_NOTIFY_SIGNAL | _TIF_ASYNC_PF)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 6237486ff6bb..2cd2d13aa905 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -915,6 +915,23 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 unsigned long thread_flags)
 {
do {
+   if (thread_flags & _TIF_ASYNC_PF) {
+   struct swait_queue_head *wq =
+   READ_ONCE(current->thread.data);
+   DECLARE_SWAITQUEUE(wait);
+
+   local_daif_restore(DAIF_PROCCTX_NOIRQ);
+
+   do {
+   prepare_to_swait_exclusive(wq,
+   &wait, TASK_UNINTERRUPTIBLE);
+   if (!test_thread_flag(TIF_ASYNC_PF))
+   break;
+
+   schedule();
+   } while (test_thread_flag(TIF_ASYNC_PF));
+   }
+
if (thread_flags & _TIF_NEED_RESCHED) {
/* Unmask Debug and SError for the next task */
local_daif_restore(DAIF_PROCCTX_NOIRQ);
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 14/17] arm64: Detect async PF para-virtualization feature

2021-02-08 Thread Gavin Shan
This implements kvm_para_available() to check if para-virtualization
features are available or not. Besides, kvm_para_has_feature() is
enhanced to detect the asynchronous page fault para-virtualization
feature. These two functions are going to be used by guest kernel
to enable the asynchronous page fault.

This also adds kernel option (CONFIG_KVM_GUEST), which is the umbrella
for the optimizations related to KVM para-virtualization.

Signed-off-by: Gavin Shan 
---
 arch/arm64/Kconfig | 11 +++
 arch/arm64/include/asm/kvm_para.h  | 12 +++-
 arch/arm64/include/uapi/asm/kvm_para.h |  2 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f39568b28ec1..792ae09aa690 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1078,6 +1078,17 @@ config PARAVIRT_TIME_ACCOUNTING
 
  If in doubt, say N here.
 
+config KVM_GUEST
+   bool "KVM Guest Support"
+   depends on PARAVIRT
+   default y
+   help
+ This option enables various optimizations for running under the KVM
+ hypervisor. Overhead for the kernel when not running inside KVM should
+ be minimal.
+
+ In case of doubt, say Y
+
 config KEXEC
depends on PM_SLEEP_SMP
select KEXEC_CORE
diff --git a/arch/arm64/include/asm/kvm_para.h 
b/arch/arm64/include/asm/kvm_para.h
index 0ea481dd1c7a..8f39c60a6619 100644
--- a/arch/arm64/include/asm/kvm_para.h
+++ b/arch/arm64/include/asm/kvm_para.h
@@ -3,6 +3,8 @@
 #define _ASM_ARM_KVM_PARA_H
 
 #include 
+#include 
+#include 
 
 static inline bool kvm_check_and_clear_guest_paused(void)
 {
@@ -11,7 +13,12 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 
 static inline unsigned int kvm_arch_para_features(void)
 {
-   return 0;
+   unsigned int features = 0;
+
+   if (kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_ASYNC_PF))
+   features |= (1 << KVM_FEATURE_ASYNC_PF);
+
+   return features;
 }
 
 static inline unsigned int kvm_arch_para_hints(void)
@@ -21,6 +28,9 @@ static inline unsigned int kvm_arch_para_hints(void)
 
 static inline bool kvm_para_available(void)
 {
+   if (IS_ENABLED(CONFIG_KVM_GUEST))
+   return true;
+
return false;
 }
 
diff --git a/arch/arm64/include/uapi/asm/kvm_para.h 
b/arch/arm64/include/uapi/asm/kvm_para.h
index 162325e2638f..70bbc7d1ec75 100644
--- a/arch/arm64/include/uapi/asm/kvm_para.h
+++ b/arch/arm64/include/uapi/asm/kvm_para.h
@@ -4,6 +4,8 @@
 
 #include 
 
+#define KVM_FEATURE_ASYNC_PF   0
+
 /* Async PF */
 #define KVM_ASYNC_PF_ENABLED   (1 << 0)
 #define KVM_ASYNC_PF_SEND_ALWAYS   (1 << 1)
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 13/17] KVM: arm64: Export async PF capability

2021-02-08 Thread Gavin Shan
This exports the asynchronous page fault capability:

* Identify capability KVM_CAP_ASYNC_{PF, PF_INT}.

* Standardize SDEI event for asynchronous page fault.

* Enable kernel config CONFIG_KVM_ASYNC_{PF, PF_SLOT}.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/uapi/asm/kvm_sdei.h | 1 +
 arch/arm64/kvm/Kconfig | 2 ++
 arch/arm64/kvm/arm.c   | 4 
 arch/arm64/kvm/sdei.c  | 5 +
 4 files changed, 12 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/kvm_sdei.h 
b/arch/arm64/include/uapi/asm/kvm_sdei.h
index 232092de5e21..47d578abba1a 100644
--- a/arch/arm64/include/uapi/asm/kvm_sdei.h
+++ b/arch/arm64/include/uapi/asm/kvm_sdei.h
@@ -13,6 +13,7 @@
 #define KVM_SDEI_MAX_VCPUS 512
 #define KVM_SDEI_INVALID_NUM   0
 #define KVM_SDEI_DEFAULT_NUM   0x4040
+#define KVM_SDEI_ASYNC_PF_NUM  0x4041
 
 struct kvm_sdei_event_state {
uint64_tnum;
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 3964acf5451e..dfb3ed0de2ca 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,8 @@ menuconfig KVM
select SRCU
select KVM_VFIO
select HAVE_KVM_EVENTFD
+   select KVM_ASYNC_PF
+   select KVM_ASYNC_PF_SLOT
select HAVE_KVM_IRQFD
select HAVE_KVM_MSI
select HAVE_KVM_IRQCHIP
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index be0e6c2db2a5..0940de3ebcff 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -269,6 +269,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_SDEI:
r = 1;
break;
+   case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_ASYNC_PF_INT:
+   r = IS_ENABLED(CONFIG_KVM_ASYNC_PF) ? 1 : 0;
+   break;
default:
r = 0;
}
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 4f5a582daa97..437303bfafba 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -19,6 +19,11 @@ static struct kvm_sdei_event_state defined_kse[] = {
  1,
  SDEI_EVENT_PRIORITY_CRITICAL
},
+   { KVM_SDEI_ASYNC_PF_NUM,
+ SDEI_EVENT_TYPE_PRIVATE,
+ 1,
+ SDEI_EVENT_PRIORITY_CRITICAL
+   },
 };
 
 static struct kvm_sdei_event *kvm_sdei_find_event(struct kvm *kvm,
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 11/17] KVM: arm64: Support async PF hypercalls

2021-02-08 Thread Gavin Shan
This introduces (SMCCC) KVM vendor specific services to configure
the asynchronous page fault functionality. The following services
are introduced:

   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION
 Returns the version, which can be used to identify ABI changes
 in the future.
   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS
 Return maximal number of tokens that current vCPU can have.
 It's used by guest to allocate the required resources.
   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_{SDEI, IRQ}
 Return the associated SDEI or (PPI) IRQ number, configured by
 vCPU ioctl command.
   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE
 Enable or disable asynchronous page fault on current vCPU.

The corresponding SDEI event and (PPI) IRQ are owned by VMM. So they
are configured by vCPU ioctl interface and it will be implemented when
the asynchronous page fault capability is exported in the subsequent
patches.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/async_pf.c | 119 ++
 include/linux/arm-smccc.h |   5 ++
 2 files changed, 124 insertions(+)

diff --git a/arch/arm64/kvm/async_pf.c b/arch/arm64/kvm/async_pf.c
index f73c406456e9..4734c5b26aa8 100644
--- a/arch/arm64/kvm/async_pf.c
+++ b/arch/arm64/kvm/async_pf.c
@@ -313,12 +313,115 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token), 0);
 }
 
+static void kvm_arch_async_sdei_notifier(struct kvm_vcpu *vcpu,
+unsigned long num,
+unsigned int state)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+
+   if (!apf)
+   return;
+
+   if (num != apf->sdei_event_num) {
+   kvm_err("%s: Invalid event number (%d-%d %lx-%llx)\n",
+   __func__, kvm->userspace_pid, vcpu->vcpu_idx,
+   num, apf->sdei_event_num);
+   return;
+   }
+
+   switch (state) {
+   case KVM_SDEI_NOTIFY_DELIVERED:
+   if (!apf->notpresent_pending)
+   break;
+
+   apf->notpresent_token = 0;
+   apf->notpresent_pending = false;
+   break;
+   case KVM_SDEI_NOTIFY_COMPLETED:
+   break;
+   default:
+   kvm_err("%s: Invalid state (%d-%d %lx-%d)\n",
+   __func__, kvm->userspace_pid, vcpu->vcpu_idx,
+   num, state);
+   }
+}
+
+static long kvm_arch_async_enable(struct kvm_vcpu *vcpu, u64 data)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+   gpa_t gpa = (data & ~0x3FUL);
+   bool enabled, enable;
+   int ret;
+
+   if (!apf || !irqchip_in_kernel(kvm))
+   return SMCCC_RET_NOT_SUPPORTED;
+
+   /* Bail if the state transition isn't allowed */
+   enabled = !!(apf->control_block & KVM_ASYNC_PF_ENABLED);
+   enable = !!(data & KVM_ASYNC_PF_ENABLED);
+   if (enable == enabled) {
+   kvm_debug("%s: Async PF has been %s on (%d-%d %llx-%llx)\n",
+ __func__, enabled ? "enabled" : "disabled",
+ kvm->userspace_pid, vcpu->vcpu_idx,
+ apf->control_block, data);
+   return SMCCC_RET_NOT_REQUIRED;
+   }
+
+   /* To disable the functinality */
+   if (!enable) {
+   kvm_clear_async_pf_completion_queue(vcpu);
+   apf->control_block = data;
+   return SMCCC_RET_SUCCESS;
+   }
+
+   /*
+* The SDEI event and IRQ number should have been given
+* prior to enablement.
+*/
+   if (!apf->sdei_event_num || !apf->irq) {
+   kvm_err("%s: Invalid SDEI event or IRQ (%d-%d %llx-%d)\n",
+   __func__, kvm->userspace_pid, vcpu->vcpu_idx,
+   apf->sdei_event_num, apf->irq);
+   return SMCCC_RET_INVALID_PARAMETER;
+   }
+
+   /* Register SDEI event notifier */
+   ret = kvm_sdei_register_notifier(kvm, apf->sdei_event_num,
+kvm_arch_async_sdei_notifier);
+   if (ret) {
+   kvm_err("%s: Error %d registering SDEI notifier (%d-%d %llx)\n",
+   __func__, ret, kvm->userspace_pid, vcpu->vcpu_idx,
+   apf->sdei_event_num);
+   return SMCCC_RET_NOT_SUPPORTED;
+   }
+
+   /* Initialize cache shared by host and guest */
+   ret = kvm_gfn_to_hva_cache_init(kvm, &apf->cache, gpa,
+   offsetofend(struct kvm_vcpu_pv_apf_data, token));
+   if (ret) {
+   kvm_err("%s: Error %d initializing cache (%d-%d)\n",
+   __func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
+   return SMCCC_RET_NOT_SUPPORTED;
+   }
+
+   /* Flush the token table */
+   kv

[PATCH v2 12/17] KVM: arm64: Support async PF ioctl commands

2021-02-08 Thread Gavin Shan
This supports ioctl commands for configuration and migration:

   KVM_ARM_ASYNC_PF_CMD_GET_VERSION
  Return implementation version
   KVM_ARM_ASYNC_PF_CMD_GET_SDEI
  Return SDEI event number used for page-not-present notification
   KVM_ARM_ASYNC_PF_CMD_GET_IRQ
  Return IRQ number used for page-ready notification
   KVM_ARM_ASYNC_PF_CMD_GET_CONTROL
  Get control block when VM is migrated
   KVM_ARM_ASYNC_PF_CMD_SET_SDEI
  Set SDEI event number when VM is started or migrated
   KVM_ARM_ASYNC_PF_CMD_SET_IRQ
  Set IRQ number during when VM is started or migrated
   KVM_ARM_ASYNC_PF_CMD_SET_CONTROL
  Set control block when VM is migrated

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_host.h | 14 +++
 arch/arm64/include/uapi/asm/kvm.h | 19 +
 arch/arm64/kvm/arm.c  |  6 +++
 arch/arm64/kvm/async_pf.c | 64 +++
 include/uapi/linux/kvm.h  |  3 ++
 5 files changed, 106 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 6349920fd9ce..14b3d1505b15 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -778,6 +778,8 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
 long kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu,
 long *r1, long *r2, long *r3);
+long kvm_arch_async_pf_vm_ioctl(struct kvm *kvm, unsigned long arg);
+long kvm_arch_async_pf_vcpu_ioctl(struct kvm_vcpu *vcpu, unsigned long arg);
 void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu);
 #else
 static inline void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu) { }
@@ -799,6 +801,18 @@ static inline long kvm_arch_async_pf_hypercall(struct 
kvm_vcpu *vcpu,
 {
return SMCCC_RET_NOT_SUPPORTED;
 }
+
+static inline long kvm_arch_async_pf_vm_ioctl(struct kvm *kvm,
+ unsigned long arg)
+{
+   return -EPERM;
+}
+
+static inline long kvm_arch_async_pf_vcpu_ioctl(struct kvm_vcpu *vcpu,
+   unsigned long arg)
+{
+   return -EPERM;
+}
 #endif
 
 /* Guest/host FPSIMD coordination helpers */
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 15499751997d..a6124068bee6 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -403,6 +403,25 @@ struct kvm_vcpu_events {
 #define KVM_PSCI_RET_INVAL PSCI_RET_INVALID_PARAMS
 #define KVM_PSCI_RET_DENIEDPSCI_RET_DENIED
 
+/* Asynchronous page fault */
+#define KVM_ARM_ASYNC_PF_CMD_GET_VERSION   0
+#define KVM_ARM_ASYNC_PF_CMD_GET_SDEI  1
+#define KVM_ARM_ASYNC_PF_CMD_GET_IRQ   2
+#define KVM_ARM_ASYNC_PF_CMD_GET_CONTROL   3
+#define KVM_ARM_ASYNC_PF_CMD_SET_SDEI  4
+#define KVM_ARM_ASYNC_PF_CMD_SET_IRQ   5
+#define KVM_ARM_ASYNC_PF_CMD_SET_CONTROL   6
+
+struct kvm_arm_async_pf_cmd {
+   __u32   cmd;
+   union {
+   __u32   version;
+   __u64   sdei;
+   __u32   irq;
+   __u64   control;
+   };
+};
+
 #endif
 
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e34fca3fa0ff..be0e6c2db2a5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1287,6 +1287,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
case KVM_ARM_SDEI_COMMAND: {
return kvm_sdei_vcpu_ioctl(vcpu, arg);
}
+   case KVM_ARM_ASYNC_PF_COMMAND: {
+   return kvm_arch_async_pf_vcpu_ioctl(vcpu, arg);
+   }
default:
r = -EINVAL;
}
@@ -1364,6 +1367,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
case KVM_ARM_SDEI_COMMAND: {
return kvm_sdei_vm_ioctl(kvm, arg);
}
+   case KVM_ARM_ASYNC_PF_COMMAND: {
+   return kvm_arch_async_pf_vm_ioctl(kvm, arg);
+   }
default:
return -EINVAL;
}
diff --git a/arch/arm64/kvm/async_pf.c b/arch/arm64/kvm/async_pf.c
index 4734c5b26aa8..6f763edbe3a3 100644
--- a/arch/arm64/kvm/async_pf.c
+++ b/arch/arm64/kvm/async_pf.c
@@ -464,6 +464,70 @@ long kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu,
return ret;
 }
 
+long kvm_arch_async_pf_vm_ioctl(struct kvm *kvm, unsigned long arg)
+{
+   struct kvm_arm_async_pf_cmd cmd;
+   unsigned int version = 0x01; /* v1.0.0 */
+   void __user *argp = (void __user *)arg;
+
+   if (copy_from_user(&cmd, argp, sizeof(cmd)))
+   return -EFAULT;
+
+   if (cmd.cmd != KVM_ARM_ASYNC_PF_CMD_GET_VERSION)
+   return -EINVAL;
+
+   cmd.version = version;
+   if (copy_to_user(argp, &cmd, sizeof(cmd)))
+   return -EFAULT;
+
+   return 0;
+}
+
+long kvm_arch_async_pf_vcpu_ioctl(struct kvm_vcpu *vcpu, unsigned long arg)
+{
+ 

[PATCH v2 10/17] KVM: arm64: Support page-ready notification

2021-02-08 Thread Gavin Shan
The asynchronous page fault starts with a worker when the requested
page isn't present. The worker makes the requested page present
in the background and the worker, together with the associated
information, is queued to the completion queue after that. The
worker and the completion queue are checked as below.

   * A request (KVM_REQ_ASYNC_PF) is raised if the worker is the
 first one enqueued to the completion queue. With the request,
 the completion queue is checked and the worker is dequeued.
 A PPI is sent to guest as the page-ready notification and
 the guest should acknowledge the interrupt by SMCCC interface.

   * When the notification (PPI) is acknowledged by guest, the
 completion queue is checked again and next worker is dequeued
 if we have one. For this particular worker, another notification
 (PPI) is sent to the guest without raising the request. Once the
 notification (PPI) is acknowledged by the guest, the completion
 queue is checked to process next worker, which has been queued
 to it.

Similar to page-not-present notification, the shared memory region
is used to convey the reason and token associated with the page-ready
notification. The region is represented by "struct kvm_vcpu_pv_apf_data".

The feature isn't enabled by CONFIG_KVM_ASYNC_PF yet. Also, the control
path isn't implemented and will be done in the subsequent patches.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_host.h  |  17 ++
 arch/arm64/include/uapi/asm/kvm_para.h |   1 +
 arch/arm64/kvm/arm.c   |  24 ++-
 arch/arm64/kvm/async_pf.c  | 207 +
 arch/arm64/kvm/hypercalls.c|   6 +
 include/linux/arm-smccc.h  |  10 ++
 6 files changed, 262 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 49cccefb22cf..6349920fd9ce 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -48,6 +48,7 @@
 #define KVM_REQ_RECORD_STEAL   KVM_ARCH_REQ(3)
 #define KVM_REQ_RELOAD_GICv4   KVM_ARCH_REQ(4)
 #define KVM_REQ_SDEI   KVM_ARCH_REQ(5)
+#define KVM_REQ_ASYNC_PF   KVM_ARCH_REQ(6)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 KVM_DIRTY_LOG_INITIALLY_SET)
@@ -292,10 +293,12 @@ struct kvm_arch_async_pf_control {
u64 control_block;
boolsend_user_only;
u64 sdei_event_num;
+   u32 irq;
 
u16 id;
boolnotpresent_pending;
u32 notpresent_token;
+   boolpageready_pending;
 };
 
 struct kvm_vcpu_arch {
@@ -767,6 +770,14 @@ bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
 u32 esr, gpa_t gpa, gfn_t gfn);
 bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
+void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work);
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+long kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu,
+long *r1, long *r2, long *r3);
 void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu);
 #else
 static inline void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu) { }
@@ -782,6 +793,12 @@ static inline bool kvm_arch_setup_async_pf(struct kvm_vcpu 
*vcpu,
 {
return false;
 }
+
+static inline long kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu,
+  long *r1, long *r2, long *r3)
+{
+   return SMCCC_RET_NOT_SUPPORTED;
+}
 #endif
 
 /* Guest/host FPSIMD coordination helpers */
diff --git a/arch/arm64/include/uapi/asm/kvm_para.h 
b/arch/arm64/include/uapi/asm/kvm_para.h
index 3fa04006714e..162325e2638f 100644
--- a/arch/arm64/include/uapi/asm/kvm_para.h
+++ b/arch/arm64/include/uapi/asm/kvm_para.h
@@ -9,6 +9,7 @@
 #define KVM_ASYNC_PF_SEND_ALWAYS   (1 << 1)
 
 #define KVM_PV_REASON_PAGE_NOT_PRESENT 1
+#define KVM_PV_REASON_PAGE_READY   2
 
 struct kvm_vcpu_pv_apf_data {
__u32   reason;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c98fbb4e914b..e34fca3fa0ff 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -484,9 +484,23 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
  */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
+   struct kvm_arch_async_pf_control *apf = v->arch.apf;
bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
-  

[PATCH v2 09/17] KVM: arm64: Support page-not-present notification

2021-02-08 Thread Gavin Shan
The requested page might be not resident in memory during the stage-2
page fault. For example, the requested page could be resident in swap
device (file). In this case, disk I/O is issued in order to fetch the
requested page and it could take tens of milliseconds, even hundreds
of milliseconds in extreme situation. During the period, the guest's
vCPU is suspended until the requested page becomes ready. Actually,
the something else on the guest's vCPU could be rescheduled during
the period, so that the time slice isn't wasted as the guest's vCPU
can see. This is the primary goal of the feature (Asynchronous Page
Fault).

This supports delivery of page-not-present notification through SDEI
event when the requested page isn't present. When the notification is
received on the guest's vCPU, something else (another process) can be
scheduled. The design is highlighted as below:

   * There is dedicated memory region shared by host and guest. It's
 represented by "struct kvm_vcpu_pv_apf_data". The field @reason
 indicates the reason why the SDEI event is triggered, while the
 unique @token is used by guest to associate the event with the
 suspended process.

   * One control block is associated with each guest's vCPU and it's
 represented by "struct kvm_arch_async_pf_control". It allows the
 guest to configure the functionality to indicate the situations
 where the host can deliver the page-not-present notification to
 kick off asyncrhonous page fault. Besides, runtime states are
 also maintained in this struct.

   * Before the page-not-present notification is sent to the guest's
 vCPU, a worker is started and executed asynchronously on host,
 to fetch the requested page. "struct kvm{_,_arch}async_pf" is
 associated with the worker, to track the work.

The feature isn't enabled by CONFIG_KVM_ASYNC_PF yet. Also, the
page-ready notification delivery and control path isn't implemented
and will be done in the subsequent patches.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_host.h  |  50 +
 arch/arm64/include/uapi/asm/kvm_para.h |  15 +++
 arch/arm64/kvm/Makefile|   1 +
 arch/arm64/kvm/arm.c   |   3 +
 arch/arm64/kvm/async_pf.c  | 145 +
 arch/arm64/kvm/mmu.c   |  32 +-
 6 files changed, 245 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/async_pf.c

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 00b30b7554e5..49cccefb22cf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -273,6 +273,31 @@ struct vcpu_reset_state {
boolreset;
 };
 
+/* Should be a power of two number */
+#define ASYNC_PF_PER_VCPU  64
+
+/*
+ * The association of gfn and token. The token will be sent to guest as
+ * page fault address. Also, the guest could be in aarch32 mode. So its
+ * length should be 32-bits.
+ */
+struct kvm_arch_async_pf {
+   u32 token;
+   gfn_t   gfn;
+   u32 esr;
+};
+
+struct kvm_arch_async_pf_control {
+   struct gfn_to_hva_cache cache;
+   u64 control_block;
+   boolsend_user_only;
+   u64 sdei_event_num;
+
+   u16 id;
+   boolnotpresent_pending;
+   u32 notpresent_token;
+};
+
 struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt;
void *sve_state;
@@ -375,6 +400,7 @@ struct kvm_vcpu_arch {
} steal;
 
struct kvm_sdei_vcpu *sdei;
+   struct kvm_arch_async_pf_control *apf;
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
@@ -734,6 +760,30 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
   struct kvm_device_attr *attr);
 
+#ifdef CONFIG_KVM_ASYNC_PF
+void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu);
+bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu);
+bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+u32 esr, gpa_t gpa, gfn_t gfn);
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu);
+#else
+static inline void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu) { }
+static inline void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu) { }
+
+static inline bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu)
+{
+   return false;
+}
+
+static inline bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+  u32 esr, gpa_t gpa, gfn_t gfn)
+{
+   return false;
+}
+#endif
+
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_

[PATCH v2 08/17] KVM: arm64: Add paravirtualization header files

2021-02-08 Thread Gavin Shan
We need put more stuff in the paravirtualization header files when
the asynchronous page fault is supported. The generic header files
can't meet the goal. This duplicate the generic header files to be
our platform specific header files. It's the preparatory work to
support the asynchronous page fault in subsequent patches:

   include/uapi/asm-generic/kvm_para.h
   include/asm-generic/kvm_para.h

   arch/arm64/include/uapi/asm/kvm_para.h
   arch/arm64/include/asm/kvm_para.h

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_para.h  | 27 ++
 arch/arm64/include/uapi/asm/Kbuild |  2 --
 arch/arm64/include/uapi/asm/kvm_para.h |  5 +
 3 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_para.h
 create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h

diff --git a/arch/arm64/include/asm/kvm_para.h 
b/arch/arm64/include/asm/kvm_para.h
new file mode 100644
index ..0ea481dd1c7a
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_para.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM_KVM_PARA_H
+#define _ASM_ARM_KVM_PARA_H
+
+#include 
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+   return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+   return 0;
+}
+
+static inline bool kvm_para_available(void)
+{
+   return false;
+}
+
+#endif /* _ASM_ARM_KVM_PARA_H */
diff --git a/arch/arm64/include/uapi/asm/Kbuild 
b/arch/arm64/include/uapi/asm/Kbuild
index 602d137932dc..f66554cd5c45 100644
--- a/arch/arm64/include/uapi/asm/Kbuild
+++ b/arch/arm64/include/uapi/asm/Kbuild
@@ -1,3 +1 @@
 # SPDX-License-Identifier: GPL-2.0
-
-generic-y += kvm_para.h
diff --git a/arch/arm64/include/uapi/asm/kvm_para.h 
b/arch/arm64/include/uapi/asm/kvm_para.h
new file mode 100644
index ..cd212282b90c
--- /dev/null
+++ b/arch/arm64/include/uapi/asm/kvm_para.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_ASM_ARM_KVM_PARA_H
+#define _UAPI_ASM_ARM_KVM_PARA_H
+
+#endif /* _UAPI_ASM_ARM_KVM_PARA_H */
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 07/17] KVM: arm64: Export kvm_handle_user_mem_abort()

2021-02-08 Thread Gavin Shan
The main work is handled by user_mem_abort(). After asynchronous
page fault is supported, one page fault need to be handled with
two calls to this function. It means the page fault needs to be
replayed asynchronously in that case. This renames the function
to kvm_handle_user_mem_abort() can exports it. Besides, there are
more changes introduced in order to accommodate asynchronous page
fault:

   * Add arguments @esr and @prefault to user_mem_abort(). @esr
 is the cached value of ESR_EL2 instead of fetching from the
 current vCPU when the page fault is replayed in scenario of
 asynchronous page fault. @prefault is used to indicate the
 page fault is replayed one or not.

   * Define helper functions esr_dbat_*() in asm/esr.h to extract
 or check various fields of the passed ESR_EL2 value because
 those helper functions defined in asm/kvm_emulate.h assumes
 the ESR_EL2 value has been cached in vCPU struct. It won't
 be true on handling the replayed page fault in scenario of
 asynchronous page fault.

   * Some helper functions defined in asm/kvm_emulate.h are used
 by mmu.c only and seem not to be used by other source file
 in near future. They are moved to mmu.c and renamed accordingly.

 kvm_vcpu_trap_is_exec_fault()
is_exec_fault()
 kvm_is_write_fault()
is_write_fault()
 kvm_vcpu_trap_get_fault_level()
Replaced by esr_dabt_get_fault_level()

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/esr.h |  6 
 arch/arm64/include/asm/kvm_emulate.h | 27 ++---
 arch/arm64/include/asm/kvm_host.h|  4 +++
 arch/arm64/kvm/mmu.c | 43 ++--
 4 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 29f97eb3dad4..db46eb58c633 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -321,8 +321,14 @@
 ESR_ELx_CP15_32_ISS_DIR_READ)
 
 #ifndef __ASSEMBLY__
+#include 
 #include 
 
+#define esr_dabt_get_fault_type(esr)   (esr & ESR_ELx_FSC_TYPE)
+#define esr_dabt_get_fault_level(esr)  (FIELD_GET(ESR_ELx_FSC_LEVEL, esr))
+#define esr_dabt_is_wnr(esr)   (!!(FIELD_GET(ESR_ELx_WNR, esr)))
+#define esr_dabt_is_s1ptw(esr) (!!(FIELD_GET(ESR_ELx_S1PTW, esr)))
+
 static inline bool esr_is_data_abort(u32 esr)
 {
const u32 ec = ESR_ELx_EC(esr);
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 0ef213b715a5..119b953828a2 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -282,13 +282,13 @@ static __always_inline int kvm_vcpu_dabt_get_rd(const 
struct kvm_vcpu *vcpu)
 
 static __always_inline bool kvm_vcpu_abt_iss1tw(const struct kvm_vcpu *vcpu)
 {
-   return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_S1PTW);
+   return esr_dabt_is_s1ptw(kvm_vcpu_get_esr(vcpu));
 }
 
 /* Always check for S1PTW *before* using this. */
 static __always_inline bool kvm_vcpu_dabt_iswrite(const struct kvm_vcpu *vcpu)
 {
-   return kvm_vcpu_get_esr(vcpu) & ESR_ELx_WNR;
+   return esr_dabt_is_wnr(kvm_vcpu_get_esr(vcpu));
 }
 
 static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu)
@@ -317,11 +317,6 @@ static inline bool kvm_vcpu_trap_is_iabt(const struct 
kvm_vcpu *vcpu)
return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
 }
 
-static inline bool kvm_vcpu_trap_is_exec_fault(const struct kvm_vcpu *vcpu)
-{
-   return kvm_vcpu_trap_is_iabt(vcpu) && !kvm_vcpu_abt_iss1tw(vcpu);
-}
-
 static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
 {
return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC;
@@ -329,12 +324,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault(const 
struct kvm_vcpu *vcpu)
 
 static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu 
*vcpu)
 {
-   return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
-}
-
-static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu 
*vcpu)
-{
-   return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
+   return esr_dabt_get_fault_type(kvm_vcpu_get_esr(vcpu));
 }
 
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
@@ -362,17 +352,6 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
kvm_vcpu *vcpu)
return ESR_ELx_SYS64_ISS_RT(esr);
 }
 
-static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
-{
-   if (kvm_vcpu_abt_iss1tw(vcpu))
-   return true;
-
-   if (kvm_vcpu_trap_is_iabt(vcpu))
-   return false;
-
-   return kvm_vcpu_dabt_iswrite(vcpu);
-}
-
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 01eda5c84600..00b30b7554e5 100644
--- a/arch/arm64

[PATCH v2 06/17] KVM: arm64: Advertise KVM UID to guests via SMCCC

2021-02-08 Thread Gavin Shan
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Signed-off-by: Will Deacon 
Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/hypercalls.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index a54c4805f2a6..e02e29a12bbf 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   long val[4] = { SMCCC_RET_NOT_SUPPORTED };
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case SPECTRE_VULNERABLE:
break;
case SPECTRE_MITIGATED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
+   val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
break;
}
break;
@@ -54,22 +54,31 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
fallthrough;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
case SDEI_1_0_FN_SDEI_VERSION:
case SDEI_1_0_FN_SDEI_EVENT_REGISTER:
@@ -93,6 +102,6 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 05/17] arm64: Probe for the presence of KVM hypervisor services during boot

2021-02-08 Thread Gavin Shan
From: Will Deacon 

Although the SMCCC specification provides some limited functionality for
describing the presence of hypervisor and firmware services, this is
generally applicable only to functions designated as "Arm Architecture
Service Functions" and no portable discovery mechanism is provided for
standard hypervisor services, despite having a designated range of
function identifiers reserved by the specification.

In an attempt to avoid the need for additional firmware changes every
time a new function is added, introduce a UID to identify the service
provider as being compatible with KVM. Once this has been established,
additional services can be discovered via a feature bitmap.

Signed-off-by: Will Deacon 
Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/hypervisor.h | 11 ++
 arch/arm64/kernel/setup.c   | 32 +
 include/linux/arm-smccc.h   | 25 ++
 3 files changed, 68 insertions(+)

diff --git a/arch/arm64/include/asm/hypervisor.h 
b/arch/arm64/include/asm/hypervisor.h
index f9cc1d021791..91e4bd890819 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -2,6 +2,17 @@
 #ifndef _ASM_ARM64_HYPERVISOR_H
 #define _ASM_ARM64_HYPERVISOR_H
 
+#include 
 #include 
 
+static inline bool kvm_arm_hyp_service_available(u32 func_id)
+{
+   extern DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS);
+
+   if (func_id >= ARM_SMCCC_KVM_NUM_FUNCS)
+   return -EINVAL;
+
+   return test_bit(func_id, __kvm_arm_hyp_services);
+}
+
 #endif
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index c18aacde8bb0..8cbb99d80869 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -275,12 +276,42 @@ static int __init reserve_memblock_reserved_regions(void)
 arch_initcall(reserve_memblock_reserved_regions);
 
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
+DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS) = { };
 
 u64 cpu_logical_map(unsigned int cpu)
 {
return __cpu_logical_map[cpu];
 }
 
+static void __init kvm_init_hyp_services(void)
+{
+   struct arm_smccc_res res;
+   int i;
+
+   arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, &res);
+   if (res.a0 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 ||
+   res.a1 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1 ||
+   res.a2 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2 ||
+   res.a3 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3)
+   return;
+
+   memset(&res, 0, sizeof(res));
+   arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID, &res);
+   for (i = 0; i < 32; ++i) {
+   if (res.a0 & (i))
+   set_bit(i + (32 * 0), __kvm_arm_hyp_services);
+   if (res.a1 & (i))
+   set_bit(i + (32 * 1), __kvm_arm_hyp_services);
+   if (res.a2 & (i))
+   set_bit(i + (32 * 2), __kvm_arm_hyp_services);
+   if (res.a3 & (i))
+   set_bit(i + (32 * 3), __kvm_arm_hyp_services);
+   }
+
+   pr_info("KVM hypervisor services detected (0x%08lx 0x%08lx 0x%08lx 
0x%08lx)\n",
+   res.a3, res.a2, res.a1, res.a0);
+}
+
 void __init __no_sanitize_address setup_arch(char **cmdline_p)
 {
init_mm.start_code = (unsigned long) _stext;
@@ -353,6 +384,7 @@ void __init __no_sanitize_address setup_arch(char 
**cmdline_p)
else
psci_acpi_init();
 
+   kvm_init_hyp_services();
init_bootcpu_ops();
smp_init_cpus();
smp_build_mpidr_hash();
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index f860645f6512..7eb816241697 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -55,6 +55,8 @@
 #define ARM_SMCCC_OWNER_TRUSTED_OS 50
 #define ARM_SMCCC_OWNER_TRUSTED_OS_END 63
 
+#define ARM_SMCCC_FUNC_QUERY_CALL_UID  0xff01
+
 #define ARM_SMCCC_QUIRK_NONE   0
 #define ARM_SMCCC_QUIRK_QCOM_A61 /* Save/restore register a6 */
 
@@ -102,6 +104,29 @@
   ARM_SMCCC_OWNER_STANDARD_HYP,\
   0x21)
 
+#define ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID  \
+   ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+  ARM_SMCCC_SMC_32,\
+  ARM_SMCCC_OWNER_VENDOR_HYP,  \
+  ARM_SMCCC_FUNC_QUERY_CALL_UID)
+
+/* KVM UID value: 28b46fb6-2ec5-11e9-a9ca-4b564d003a74 */
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 0xb66fb428U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1 0xe911c52eU
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2 0x564bcaa9U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3 0x743a004dU
+
+/* KVM "vendor specific" services */
+#defin

[PATCH v2 04/17] KVM: x86: Use generic async PF slot management

2021-02-08 Thread Gavin Shan
This uses the generic slot management mechanism for asynchronous
page fault by enabling CONFIG_KVM_ASYNC_PF_SLOT because the private
implementation is totally duplicate to the generic one.

The changes introduced by this is pretty mechanical and shouldn't
cause any logical changes.

Signed-off-by: Gavin Shan 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/Kconfig|  1 +
 arch/x86/kvm/mmu/mmu.c  |  2 +-
 arch/x86/kvm/x86.c  | 86 +++--
 4 files changed, 8 insertions(+), 82 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3d6616f6f6ef..3488eeb79c79 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1714,7 +1714,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
   struct kvm_async_pf *work);
 void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
 bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
-extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 7ac592664c52..b0ad75087ab5 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -34,6 +34,7 @@ config KVM
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_EVENTFD
select KVM_ASYNC_PF
+   select KVM_ASYNC_PF_SLOT
select USER_RETURN_NOTIFIER
select KVM_MMIO
select TASKSTATS
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d16481aa29d..ca2e84d6743c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3678,7 +3678,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool 
prefault, gfn_t gfn,
 
if (!prefault && kvm_can_do_async_pf(vcpu)) {
trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
-   if (kvm_find_async_pf_gfn(vcpu, gfn)) {
+   if (kvm_async_pf_find_slot(vcpu, gfn)) {
trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
kvm_make_request(KVM_REQ_APF_HALT, vcpu);
return true;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f3c9fe5c424e..b04d78a87abe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -290,13 +290,6 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void)
 
 static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
 
-static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
-{
-   int i;
-   for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
-   vcpu->arch.apf.gfns[i] = ~0;
-}
-
 static void kvm_on_user_return(struct user_return_notifier *urn)
 {
unsigned slot;
@@ -812,7 +805,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long 
old_cr0, unsigned lon
 
if ((cr0 ^ old_cr0) & X86_CR0_PG) {
kvm_clear_async_pf_completion_queue(vcpu);
-   kvm_async_pf_hash_reset(vcpu);
+   kvm_async_pf_reset_slot(vcpu);
}
 
if ((cr0 ^ old_cr0) & update_bits)
@@ -2905,7 +2898,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, 
u64 data)
 
if (!kvm_pv_async_pf_enabled(vcpu)) {
kvm_clear_async_pf_completion_queue(vcpu);
-   kvm_async_pf_hash_reset(vcpu);
+   kvm_async_pf_reset_slot(vcpu);
return 0;
}
 
@@ -9996,7 +9989,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
 
-   kvm_async_pf_hash_reset(vcpu);
+   kvm_async_pf_reset_slot(vcpu);
kvm_pmu_init(vcpu);
 
vcpu->arch.pending_external_vector = -1;
@@ -10117,7 +10110,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
kvmclock_reset(vcpu);
 
kvm_clear_async_pf_completion_queue(vcpu);
-   kvm_async_pf_hash_reset(vcpu);
+   kvm_async_pf_reset_slot(vcpu);
vcpu->arch.apf.halted = false;
 
if (vcpu->arch.guest_fpu && kvm_mpx_supported()) {
@@ -10932,73 +10925,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, 
struct kvm_async_pf *work)
kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
 }
 
-static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
-{
-   BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
-
-   return hash_32(gfn & 0x, order_base_2(ASYNC_PF_PER_VCPU));
-}
-
-static inline u32 kvm_async_pf_next_probe(u32 key)
-{
-   return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
-}
-
-static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-   u32 key = kvm_async_pf_hash_fn(gfn);
-
-   while (vcpu->arch.apf.gfns[key] != ~0)
-   key = kvm_async_pf_next_probe(key);
-
-   vcpu->arch.apf.gfns[key] = gfn;
-}
-
-static u32 kvm_async_pf_gfn_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-   int i;
-   u32 key 

[PATCH v2 03/17] KVM: async_pf: Make GFN slot management generic

2021-02-08 Thread Gavin Shan
It's not allowed to fire duplicate notification for same GFN on
x86 platform, with help of a hash table. This mechanism is going
to be used by arm64 and this makes the code generic and shareable
by multiple platforms.

   * As this mechanism isn't needed by all platforms, a new kernel
 config option (CONFIG_ASYNC_PF_SLOT) is introduced so that it
 can be disabled at compiling time.

   * The code is basically copied from x86 platform and the functions
 are renamed to reflect the fact: (a) the input parameters are
 vCPU and GFN. (b) The operations are resetting, searching, adding
 and removing.

   * Helper stub is also added on !CONFIG_KVM_ASYNC_PF because we're
 going to use IS_ENABLED() instead of #ifdef on arm64 when the
 asynchronous page fault is supported.

This is preparatory work to use the newly introduced functions on x86
platform and arm64 in subsequent patches.

Signed-off-by: Gavin Shan 
---
 include/linux/kvm_host.h | 18 +
 virt/kvm/Kconfig |  3 ++
 virt/kvm/async_pf.c  | 79 
 3 files changed, 100 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 041d93f8f4b0..b52d71030f25 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -277,6 +277,9 @@ struct kvm_vcpu {
 
 #ifdef CONFIG_KVM_ASYNC_PF
struct {
+#ifdef CONFIG_KVM_ASYNC_PF_SLOT
+   gfn_t gfns[ASYNC_PF_PER_VCPU];
+#endif
u32 queued;
struct list_head queue;
struct list_head done;
@@ -321,12 +324,27 @@ static inline bool 
kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
return !list_empty_careful(&vcpu->async_pf.done);
 }
 
+#ifdef CONFIG_KVM_ASYNC_PF_SLOT
+void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu);
+void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
+void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
+bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
+#endif
+
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
 bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long hva, struct kvm_arch_async_pf *arch);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #else
+static inline void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu) { }
+static inline void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
+static inline void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn) 
{ }
+static inline bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   return false;
+}
+
 static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
 {
return false;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 1c37ccd5d402..69a282aaa4df 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -23,6 +23,9 @@ config KVM_MMIO
 config KVM_ASYNC_PF
bool
 
+config KVM_ASYNC_PF_SLOT
+   bool
+
 # Toggle to switch between direct notification and batch job
 config KVM_ASYNC_PF_SYNC
bool
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 2cf864aafd0e..7bf22b20af45 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -19,6 +19,85 @@
 
 static struct kmem_cache *async_pf_cache;
 
+#ifdef CONFIG_KVM_ASYNC_PF_SLOT
+static inline u32 kvm_async_pf_hash(gfn_t gfn)
+{
+   BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
+
+   return hash_32(gfn & 0x, order_base_2(ASYNC_PF_PER_VCPU));
+}
+
+static inline u32 kvm_async_pf_next_slot(u32 key)
+{
+   return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
+}
+
+static u32 kvm_async_pf_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   int i;
+   u32 key = kvm_async_pf_hash(gfn);
+
+   for (i = 0; i < ASYNC_PF_PER_VCPU &&
+   (vcpu->async_pf.gfns[key] != gfn &&
+   vcpu->async_pf.gfns[key] != ~0); i++)
+   key = kvm_async_pf_next_slot(key);
+
+   return key;
+}
+
+void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+   for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
+   vcpu->async_pf.gfns[i] = ~0;
+}
+
+bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   return vcpu->async_pf.gfns[kvm_async_pf_slot(vcpu, gfn)] == gfn;
+}
+
+void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   u32 key = kvm_async_pf_hash(gfn);
+
+   while (vcpu->async_pf.gfns[key] != ~0)
+   key = kvm_async_pf_next_slot(key);
+
+   vcpu->async_pf.gfns[key] = gfn;
+}
+
+void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   u32 i, j, k;
+
+   i = j = kvm_async_pf_slot(vcpu, gfn);
+
+   if (WARN_ON_ONCE(vcpu->async_pf.gfns[i] != gfn))
+   return;
+
+   while (true) {
+   vcpu->async_pf.gfns[i] = ~0;
+   do {
+   j = kvm_async_pf_next_slot(j);
+ 

[PATCH v2 01/17] KVM: async_pf: Move struct kvm_async_pf around

2021-02-08 Thread Gavin Shan
This moves the definitions of "struct kvm_async_pf" and the related
functions after "struct kvm_vcpu" so that newly added inline function
can dereference "struct kvm_vcpu" properly. Otherwise, the unexpected
build error will be raised:

   error: dereferencing pointer to incomplete type ‘struct kvm_vcpu’
   return !list_empty_careful(&vcpu->async_pf.done);
   ^~

The sepator between type and field is replaced by tab for "struct
kvm_async_pf" since we're here. This is preparatory work for adding
new inline function for next patch. This shouldn't cause logical
changes.

Signed-off-by: Gavin Shan 
---
 include/linux/kvm_host.h | 43 
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3b1013fb22c..b6697ee1182e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -196,27 +196,6 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum 
kvm_bus bus_idx,
 struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 gpa_t addr);
 
-#ifdef CONFIG_KVM_ASYNC_PF
-struct kvm_async_pf {
-   struct work_struct work;
-   struct list_head link;
-   struct list_head queue;
-   struct kvm_vcpu *vcpu;
-   struct mm_struct *mm;
-   gpa_t cr2_or_gpa;
-   unsigned long addr;
-   struct kvm_arch_async_pf arch;
-   bool   wakeup_all;
-   bool notpresent_injected;
-};
-
-void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
-void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
-bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-   unsigned long hva, struct kvm_arch_async_pf *arch);
-int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
-#endif
-
 enum {
OUTSIDE_GUEST_MODE,
IN_GUEST_MODE,
@@ -323,6 +302,28 @@ struct kvm_vcpu {
struct kvm_dirty_ring dirty_ring;
 };
 
+#ifdef CONFIG_KVM_ASYNC_PF
+struct kvm_async_pf {
+   struct work_struct  work;
+   struct list_headlink;
+   struct list_headqueue;
+   struct kvm_vcpu *vcpu;
+   struct mm_struct*mm;
+   gpa_t   cr2_or_gpa;
+   unsigned long   addr;
+   struct kvm_arch_async_pfarch;
+   boolwakeup_all;
+   boolnotpresent_injected;
+};
+
+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
+void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
+bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+   unsigned long hva, struct kvm_arch_async_pf *arch);
+int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
+#endif
+
+
 static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
 {
/*
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 02/17] KVM: async_pf: Add helper function to check completion queue

2021-02-08 Thread Gavin Shan
This adds inline function kvm_check_async_pf_completion_queue()
and stub on !CONFIG_KVM_ASYNC_PF so that the source code won't
have to care about CONFIG_KVM_ASYNC_PF. The kernel option is
used for once in kvm_main.c and it can be removed then. Besides,
the checks on the completion queue are all replaced by the newly
introduced helper as list_empty() and list_empty_careful() are
interchangeable.

The stub kvm_check_async_pf_completion() on !CONFIG_KVM_ASYNC_PF
is also introduced. It will be used by subsequent patch.

Signed-off-by: Gavin Shan 
---
 arch/x86/kvm/x86.c   |  2 +-
 include/linux/kvm_host.h | 12 
 virt/kvm/async_pf.c  | 12 ++--
 virt/kvm/kvm_main.c  |  4 +---
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..f3c9fe5c424e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10794,7 +10794,7 @@ static inline bool kvm_guest_apic_has_interrupt(struct 
kvm_vcpu *vcpu)
 
 static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 {
-   if (!list_empty_careful(&vcpu->async_pf.done))
+   if (kvm_check_async_pf_completion_queue(vcpu))
return true;
 
if (kvm_apic_has_events(vcpu))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b6697ee1182e..041d93f8f4b0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -316,11 +316,23 @@ struct kvm_async_pf {
boolnotpresent_injected;
 };
 
+static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
+{
+   return !list_empty_careful(&vcpu->async_pf.done);
+}
+
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
 bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long hva, struct kvm_arch_async_pf *arch);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
+#else
+static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
+{
+   return false;
+}
+
+static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
 #endif
 
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index dd777688d14a..2cf864aafd0e 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -70,7 +70,7 @@ static void async_pf_execute(struct work_struct *work)
kvm_arch_async_page_present(vcpu, apf);
 
spin_lock(&vcpu->async_pf.lock);
-   first = list_empty(&vcpu->async_pf.done);
+   first = !kvm_check_async_pf_completion_queue(vcpu);
list_add_tail(&apf->link, &vcpu->async_pf.done);
apf->vcpu = NULL;
spin_unlock(&vcpu->async_pf.lock);
@@ -122,7 +122,7 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu 
*vcpu)
spin_lock(&vcpu->async_pf.lock);
}
 
-   while (!list_empty(&vcpu->async_pf.done)) {
+   while (kvm_check_async_pf_completion_queue(vcpu)) {
struct kvm_async_pf *work =
list_first_entry(&vcpu->async_pf.done,
 typeof(*work), link);
@@ -138,8 +138,8 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 {
struct kvm_async_pf *work;
 
-   while (!list_empty_careful(&vcpu->async_pf.done) &&
- kvm_arch_can_dequeue_async_page_present(vcpu)) {
+   while (kvm_check_async_pf_completion_queue(vcpu) &&
+  kvm_arch_can_dequeue_async_page_present(vcpu)) {
spin_lock(&vcpu->async_pf.lock);
work = list_first_entry(&vcpu->async_pf.done, typeof(*work),
  link);
@@ -205,7 +205,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
struct kvm_async_pf *work;
bool first;
 
-   if (!list_empty_careful(&vcpu->async_pf.done))
+   if (kvm_check_async_pf_completion_queue(vcpu))
return 0;
 
work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
@@ -216,7 +216,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
INIT_LIST_HEAD(&work->queue); /* for list_del to work */
 
spin_lock(&vcpu->async_pf.lock);
-   first = list_empty(&vcpu->async_pf.done);
+   first = !kvm_check_async_pf_completion_queue(vcpu);
list_add_tail(&work->link, &vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8367d88ce39b..632b80b6e485 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2961,10 +2961,8 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu)
if (kvm_arch_dy_runnable(vcpu))
return true;
 
-#ifdef CONFIG_KVM_ASYNC_PF
-   if (!list_empty_careful(&vcpu->async_pf.done))
+   if (kvm_check_async_pf_completion_queue(vcpu))
return true;
-#endif
 
return false;
 }
-- 
2.23.0

_

[PATCH v2 00/17] Support Asynchronous Page Fault

2021-02-08 Thread Gavin Shan
There are two stages of page fault. The guest kernel is responsible for
handling stage-1 page fault, while the host kernel is to take care of the
stage-2 page fault. When the guest is trapped to host because of stage-2
page fault, the guest is suspended until the requested memory (page) is
populated. Sometimes, the cost to populate the requested page isn't cheap
and can take hundreds of milliseconds in extreme cases. Similarly, the
guest has to wait until the requested memory is ready in the scenario of
post-copy live migration.

This series introduces the feature (Asynchronous Page Fault) to improve
situation, so that the guest needn't have to wait in the scenarios. With
it, the overall performance is improved on the guest. This series depends
on the feature "SDEI virtualization" and QEMU changes. All code changes
can be found from github:

 https://github.com/gwshan/linux ("sdei") # SDEI virtualization
 https://github.com/gwshan/linux ("apf")  # This series + "sdei"
 https://github.com/gwshan/qemu  ("apf")  # QEMU code changes

About the design, the details can be found from last patch. Generally,
it's driven by two notifications: page-not-present and page-ready. They
are delivered from the host to guest via SDEI event and PPI separately.
In the mean while, each notification is always associated with a token,
used to identify the notification. The token is passed by the shared
memory between host/guest. Besides, the SMCCC and ioctl interface are
mitigated by VMM and guest to configure, enable, disable, even migrate
the functionality.

When the guest is trapped to host because of stage-2 page fault, a
page-not-present notification is raised by the host, and sent to the
guest through dedicated SDEI event (0x4041) if the requested page
can't be populated immediately. In the mean while, a (background) worker
is also started to populate the requested page. On receiving the SDEI
event, the guest marks the current running process with special flag
(TIF_ASYNC_PF) and associates it with a pre-allocated waitqueue. At
same time, a (reschedule) IPI is sent to current CPU. After the SDEI
event is acknowledged by the guest, the (reschedule) IPI is delivered
and it causes context switch from that process tagged with TIF_ASYNC_PF
to another process.

Later on, a page-ready notification is sent to guest after the requested
page is populated by the (background) worker. On receiving the interrupt,
the guest uses the associated token to locate the process, which was
previously suspended because of page-not-present. The flag (TIF_ASYNC_PF)
is cleared for the suspended process and it's waken up.

The series is organized as below:

   PATCH[01-04] makes the GFN hash table management generic so that it
can be shared by x86/arm64.
   PATCH[05-06] support KVM hypervisor SMCCC services from Will Deacon.
   PATCH[07-08] preparatory work to support asynchronous page fault.
   PATCH[09-10] support asynchronous page fault.
   PATCH[11-13] support ioctl and SMCCC interfaces for the functionality.
   PATHC[14-16] supoort asynchronous page fault for guest
   PATCH[17]adds document to explain the design and internals

Testing
===

The tests are taken using program "testsuite mem", which is written by
myself. The program basically does two things: (a) Starts a thread to
allocate the specified percentage of available memory, write random
values to them by the specified times, and then release them; (b) Thread
is started to do calculation if needed.

In the mean while, the guest is always assigned with only one vCPU and
4096MB memory in all test cases. The memory cgroup where qemu process
is associated could have different memory limitation settings.

[1] Allocate/free memory without calculation thread

Index  -APF  +APFOutput
-
  139477ms   38367ms +0.28% 
  251272ms   49760ms +0.29%

The consumed time is decreased a bit and it should be benefited from
the (background) workers, which runs in parallel.

[2] Allocate/free memory with calculation thread

Index  -APF  +APFOutput
-
  1 81442ms  7155198892  171335ms 22391255613-110% +213% 
  2122002ms 11438214429  191126ms 24984499197-57%  +119%

The increased calculation amount is almost 2 times of the increased
time.

[3] 5 Times of allocate/free memory with calculation thread and post-copy
live migration

 Index  -APF  +APFOutput
-
  1 240635ms 19722999876  658955ms 89242030748+174% +352%


Results retrieved from "info migrate":

 Param  -APF  

[PATCH v2 21/21] KVM: selftests: Add SDEI test case

2021-02-08 Thread Gavin Shan
This adds SDEI test case into selftests where the various hypercalls
are issued to kvm private event (0x4020) and then ensure that's
completed without error. Note that two vCPUs are started up by default
to run same consequence. Actually, it's simulating what SDEI client
driver does and the following hypercalls are issued in sequence:

   SDEI_1_0_FN_SDEI_VERSION(probing SDEI capability)
   SDEI_1_0_FN_SDEI_PE_UNMASK  (CPU online)
   SDEI_1_0_FN_SDEI_PRIVATE_RESET  (restart SDEI)
   SDEI_1_0_FN_SDEI_SHARED_RESET
   SDEI_1_0_FN_SDEI_EVENT_GET_INFO (register event)
   SDEI_1_0_FN_SDEI_EVENT_GET_INFO
   SDEI_1_0_FN_SDEI_EVENT_GET_INFO
   SDEI_1_0_FN_SDEI_EVENT_REGISTER
   SDEI_1_0_FN_SDEI_EVENT_ENABLE   (enable event)
   SDEI_1_0_FN_SDEI_EVENT_DISABLE  (disable event)
   SDEI_1_0_FN_SDEI_EVENT_UNREGISTER   (unregister event)
   SDEI_1_0_FN_SDEI_PE_MASK(CPU offline)

Signed-off-by: Gavin Shan 
---
 tools/testing/selftests/kvm/Makefile   |   1 +
 tools/testing/selftests/kvm/aarch64/sdei.c | 172 +
 2 files changed, 173 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/aarch64/sdei.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index fe41c6a0fa67..482faa88520b 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -74,6 +74,7 @@ TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
+TEST_GEN_PROGS_aarch64 += aarch64/sdei
 
 TEST_GEN_PROGS_s390x = s390x/memop
 TEST_GEN_PROGS_s390x += s390x/resets
diff --git a/tools/testing/selftests/kvm/aarch64/sdei.c 
b/tools/testing/selftests/kvm/aarch64/sdei.c
new file mode 100644
index ..1a4cdae84ad5
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/sdei.c
@@ -0,0 +1,172 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM SDEI test
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Author(s): Gavin Shan 
+ */
+#define _GNU_SOURCE
+#include 
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "asm/kvm_sdei.h"
+#include "linux/arm_sdei.h"
+
+#define NR_VCPUS   2
+#define SDEI_GPA_BASE  (1 << 30)
+
+struct sdei_event {
+   uint32_tcpu;
+   uint64_tversion;
+   uint64_tnum;
+   uint64_ttype;
+   uint64_tpriority;
+   uint64_tsignaled;
+};
+
+static struct sdei_event sdei_events[NR_VCPUS];
+
+static int64_t smccc(uint32_t func, uint64_t arg0, uint64_t arg1,
+uint64_t arg2, uint64_t arg3, uint64_t arg4)
+{
+   int64_t ret;
+
+   asm volatile(
+   "movx0, %1\n"
+   "movx1, %2\n"
+   "movx2, %3\n"
+   "movx3, %4\n"
+   "movx4, %5\n"
+   "movx5, %6\n"
+   "hvc#0\n"
+   "mov%0, x0\n"
+   : "=r" (ret) : "r" (func), "r" (arg0), "r" (arg1),
+   "r" (arg2), "r" (arg3), "r" (arg4) :
+   "x0", "x1", "x2", "x3", "x4", "x5");
+
+   return ret;
+}
+
+static inline bool is_error(int64_t ret)
+{
+   if (ret == SDEI_NOT_SUPPORTED  ||
+   ret == SDEI_INVALID_PARAMETERS ||
+   ret == SDEI_DENIED ||
+   ret == SDEI_PENDING||
+   ret == SDEI_OUT_OF_RESOURCE)
+   return true;
+
+   return false;
+}
+
+static void guest_code(int cpu)
+{
+   struct sdei_event *event = &sdei_events[cpu];
+   int64_t ret;
+
+   /* CPU */
+   event->cpu = cpu;
+   event->num = KVM_SDEI_DEFAULT_NUM;
+   GUEST_ASSERT(cpu < NR_VCPUS);
+
+   /* Version */
+   ret = smccc(SDEI_1_0_FN_SDEI_VERSION, 0, 0, 0, 0, 0);
+   GUEST_ASSERT(!is_error(ret));
+   GUEST_ASSERT(SDEI_VERSION_MAJOR(ret) == 1);
+   GUEST_ASSERT(SDEI_VERSION_MINOR(ret) == 0);
+   event->version = ret;
+
+   /* CPU unmasking */
+   ret = smccc(SDEI_1_0_FN_SDEI_PE_UNMASK, 0, 0, 0, 0, 0);
+   GUEST_ASSERT(!is_error(ret));
+
+   /* Reset */
+   ret = smccc(SDEI_1_0_FN_SDEI_PRIVATE_RESET, 0, 0, 0, 0, 0);
+   GUEST_ASSERT(!is_error(ret));
+   ret = smccc(SDEI_1_0_FN_SDEI_SHARED_RESET, 0, 0, 0, 0, 0);
+   GUEST_ASSERT(!is_error(ret));
+
+   /* Event properties */
+   ret = smccc(SDEI_1_0_FN_SDEI_EVENT_GET_INFO,
+event->num, SDEI_EVENT_INFO_EV_TYPE, 0, 0, 0);
+   GUEST_ASSERT(!is_error(ret));
+   event->type = ret;
+
+   ret = smccc(SDEI_1_0_FN_SDEI_EVENT_GET_INFO,
+   event->num, SDEI_EVENT_INFO_EV_PRIORITY, 0, 0, 0);
+   GUEST_ASSERT(!is_error(ret));
+   event->priority = ret;
+
+   ret = smccc(SDEI_1_0_FN_SDEI_EVENT_GET_INFO,
+   event->num, SDEI_EVENT_INFO_EV_SIGNALED, 0, 0, 0);
+   GUEST_ASSERT(!is_error(ret));
+   event->signaled = r

[PATCH v2 20/21] KVM: arm64: Export SDEI capability

2021-02-08 Thread Gavin Shan
The SDEI functionality is ready to be exported so far. This adds
new capability (KVM_CAP_ARM_SDEI) and exports it.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/arm.c | 3 +++
 include/uapi/linux/kvm.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 55ccd234b0ec..f8b44a29e164 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -266,6 +266,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_PTRAUTH_GENERIC:
r = system_has_full_ptr_auth();
break;
+   case KVM_CAP_ARM_SDEI:
+   r = 1;
+   break;
default:
r = 0;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b056b4ac884b..133128d45fcb 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1058,6 +1058,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
 #define KVM_CAP_SYS_HYPERV_CPUID 191
 #define KVM_CAP_DIRTY_LOG_RING 192
+#define KVM_CAP_ARM_SDEI 193
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 19/21] KVM: arm64: Support SDEI event cancellation

2021-02-08 Thread Gavin Shan
The injected SDEI event is to send notification to guest. The SDEI
event might not be needed after it's injected. This introduces API
to support cancellation on the injected SDEI event if it's not fired
to the guest yet.

This mechanism will be needed when we're going to support asynchronous
page fault.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_sdei.h |  1 +
 arch/arm64/kvm/sdei.c | 49 +++
 2 files changed, 50 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_sdei.h 
b/arch/arm64/include/asm/kvm_sdei.h
index 51087fe971ba..353744c7bad9 100644
--- a/arch/arm64/include/asm/kvm_sdei.h
+++ b/arch/arm64/include/asm/kvm_sdei.h
@@ -126,6 +126,7 @@ int kvm_sdei_register_notifier(struct kvm *kvm, unsigned 
long num,
   kvm_sdei_notifier notifier);
 int kvm_sdei_inject(struct kvm_vcpu *vcpu,
unsigned long num, bool immediate);
+int kvm_sdei_cancel(struct kvm_vcpu *vcpu, unsigned long num);
 void kvm_sdei_deliver(struct kvm_vcpu *vcpu);
 long kvm_sdei_vm_ioctl(struct kvm *kvm, unsigned long arg);
 long kvm_sdei_vcpu_ioctl(struct kvm_vcpu *vcpu, unsigned long arg);
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 7c2789cd1421..4f5a582daa97 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -907,6 +907,55 @@ int kvm_sdei_inject(struct kvm_vcpu *vcpu,
return ret;
 }
 
+int kvm_sdei_cancel(struct kvm_vcpu *vcpu, unsigned long num)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   struct kvm_sdei_vcpu_event *ksve = NULL;
+   int ret = 0;
+
+   if (!(ksdei && vsdei)) {
+   ret = -EPERM;
+   goto out;
+   }
+
+   /* Find the vCPU event */
+   spin_lock(&vsdei->lock);
+   ksve = kvm_sdei_find_vcpu_event(vcpu, num);
+   if (!ksve) {
+   ret = -EINVAL;
+   goto unlock;
+   }
+
+   /* Event can't be cancelled if it has been delivered */
+   if (ksve->state.refcount <= 1 &&
+   (vsdei->critical_event == ksve ||
+vsdei->normal_event == ksve)) {
+   ret = -EINPROGRESS;
+   goto unlock;
+   }
+
+   /* Free the vCPU event if necessary */
+   kske = ksve->kske;
+   ksve->state.refcount--;
+   if (!ksve->state.refcount) {
+   list_del(&ksve->link);
+   kfree(ksve);
+   }
+
+unlock:
+   spin_unlock(&vsdei->lock);
+   if (kske) {
+   spin_lock(&ksdei->lock);
+   kske->state.refcount--;
+   spin_unlock(&ksdei->lock);
+   }
+out:
+   return ret;
+}
+
 void kvm_sdei_deliver(struct kvm_vcpu *vcpu)
 {
struct kvm *kvm = vcpu->kvm;
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 18/21] KVM: arm64: Support SDEI event injection

2021-02-08 Thread Gavin Shan
This supports SDEI event injection by implementing kvm_sdei_inject().
It's called by kernel directly or VMM through ioctl command to inject
SDEI event to the specific vCPU.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_sdei.h  |   2 +
 arch/arm64/include/uapi/asm/kvm_sdei.h |   1 +
 arch/arm64/kvm/sdei.c  | 108 +
 3 files changed, 111 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_sdei.h 
b/arch/arm64/include/asm/kvm_sdei.h
index a997989bab77..51087fe971ba 100644
--- a/arch/arm64/include/asm/kvm_sdei.h
+++ b/arch/arm64/include/asm/kvm_sdei.h
@@ -124,6 +124,8 @@ void kvm_sdei_create_vcpu(struct kvm_vcpu *vcpu);
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu);
 int kvm_sdei_register_notifier(struct kvm *kvm, unsigned long num,
   kvm_sdei_notifier notifier);
+int kvm_sdei_inject(struct kvm_vcpu *vcpu,
+   unsigned long num, bool immediate);
 void kvm_sdei_deliver(struct kvm_vcpu *vcpu);
 long kvm_sdei_vm_ioctl(struct kvm *kvm, unsigned long arg);
 long kvm_sdei_vcpu_ioctl(struct kvm_vcpu *vcpu, unsigned long arg);
diff --git a/arch/arm64/include/uapi/asm/kvm_sdei.h 
b/arch/arm64/include/uapi/asm/kvm_sdei.h
index 3485843dd6df..232092de5e21 100644
--- a/arch/arm64/include/uapi/asm/kvm_sdei.h
+++ b/arch/arm64/include/uapi/asm/kvm_sdei.h
@@ -64,6 +64,7 @@ struct kvm_sdei_vcpu_state {
 #define KVM_SDEI_CMD_SET_VEVENT7
 #define KVM_SDEI_CMD_GET_VCPU_STATE8
 #define KVM_SDEI_CMD_SET_VCPU_STATE9
+#define KVM_SDEI_CMD_INJECT_EVENT  10
 
 struct kvm_sdei_cmd {
uint32_tcmd;
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 79315b77f24b..7c2789cd1421 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -802,6 +802,111 @@ int kvm_sdei_register_notifier(struct kvm *kvm,
return ret;
 }
 
+int kvm_sdei_inject(struct kvm_vcpu *vcpu,
+   unsigned long num,
+   bool immediate)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   struct kvm_sdei_vcpu_event *ksve = NULL;
+   int index, ret = 0;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = -EPERM;
+   goto out;
+   }
+
+   if (!kvm_sdei_is_valid_event_num(num)) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   /* Check the kvm event */
+   spin_lock(&ksdei->lock);
+   kske = kvm_sdei_find_kvm_event(kvm, num);
+   if (!kske) {
+   ret = -ENOENT;
+   goto unlock_kvm;
+   }
+
+   kse = kske->kse;
+   index = (kse->state.type == SDEI_EVENT_TYPE_PRIVATE) ?
+   vcpu->vcpu_idx : 0;
+   if (!(kvm_sdei_is_registered(kske, index) &&
+ kvm_sdei_is_enabled(kske, index))) {
+   ret = -EPERM;
+   goto unlock_kvm;
+   }
+
+   /* Check the vcpu state */
+   spin_lock(&vsdei->lock);
+   if (vsdei->state.masked) {
+   ret = -EPERM;
+   goto unlock_vcpu;
+   }
+
+   /* Check if the event can be delivered immediately */
+   if (immediate) {
+   if (kse->state.priority == SDEI_EVENT_PRIORITY_CRITICAL &&
+   !list_empty(&vsdei->critical_events)) {
+   ret = -ENOSPC;
+   goto unlock_vcpu;
+   }
+
+   if (kse->state.priority == SDEI_EVENT_PRIORITY_NORMAL &&
+   (!list_empty(&vsdei->critical_events) ||
+!list_empty(&vsdei->normal_events))) {
+   ret = -ENOSPC;
+   goto unlock_vcpu;
+   }
+   }
+
+   /* Check if the vcpu event exists */
+   ksve = kvm_sdei_find_vcpu_event(vcpu, num);
+   if (ksve) {
+   kske->state.refcount++;
+   ksve->state.refcount++;
+   kvm_make_request(KVM_REQ_SDEI, vcpu);
+   goto unlock_vcpu;
+   }
+
+   /* Allocate vcpu event */
+   ksve = kzalloc(sizeof(*ksve), GFP_KERNEL);
+   if (!ksve) {
+   ret = -ENOMEM;
+   goto unlock_vcpu;
+   }
+
+   /*
+* We should take lock to update KVM event state because its
+* reference count might be zero. In that case, the KVM event
+* could be destroyed.
+*/
+   kske->state.refcount++;
+   ksve->state.num  = num;
+   ksve->state.refcount = 1;
+   ksve->kske   = kske;
+   ksve->vcpu   = vcpu;
+
+   if (kse->state.priority == SDEI_EVENT_PRIORITY_CRITICAL)
+   list_add_tail(&ksve->link, &vsdei->critical_events);
+   else
+   list_add_tail(&ksve->link

[PATCH v2 17/21] KVM: arm64: Support SDEI ioctl commands on vCPU

2021-02-08 Thread Gavin Shan
This supports ioctl commands on vCPU to manage the various object.
It's primarily used by VMM to accomplish live migration. The ioctl
commands introduced by this are highlighted as below:

   * KVM_SDEI_CMD_GET_VEVENT_COUNT
 Retrieve number of SDEI events that pend for handling on the
 vCPU
   * KVM_SDEI_CMD_GET_VEVENT
 Retrieve the state of SDEI event, which has been delivered to
 the vCPU for handling
   * KVM_SDEI_CMD_SET_VEVENT
 Populate the SDEI event, which has been delivered to the vCPU
 for handling
   * KVM_SDEI_CMD_GET_VCPU_STATE
 Retrieve vCPU state related to SDEI handling
   * KVM_SDEI_CMD_SET_VCPU_STATE
 Populate vCPU state related to SDEI handling

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_sdei.h  |   1 +
 arch/arm64/include/uapi/asm/kvm_sdei.h |   7 +
 arch/arm64/kvm/arm.c   |   3 +
 arch/arm64/kvm/sdei.c  | 228 +
 4 files changed, 239 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_sdei.h 
b/arch/arm64/include/asm/kvm_sdei.h
index 8f5ea947ed0e..a997989bab77 100644
--- a/arch/arm64/include/asm/kvm_sdei.h
+++ b/arch/arm64/include/asm/kvm_sdei.h
@@ -126,6 +126,7 @@ int kvm_sdei_register_notifier(struct kvm *kvm, unsigned 
long num,
   kvm_sdei_notifier notifier);
 void kvm_sdei_deliver(struct kvm_vcpu *vcpu);
 long kvm_sdei_vm_ioctl(struct kvm *kvm, unsigned long arg);
+long kvm_sdei_vcpu_ioctl(struct kvm_vcpu *vcpu, unsigned long arg);
 void kvm_sdei_destroy_vcpu(struct kvm_vcpu *vcpu);
 void kvm_sdei_destroy_vm(struct kvm *kvm);
 
diff --git a/arch/arm64/include/uapi/asm/kvm_sdei.h 
b/arch/arm64/include/uapi/asm/kvm_sdei.h
index 55de8baff841..3485843dd6df 100644
--- a/arch/arm64/include/uapi/asm/kvm_sdei.h
+++ b/arch/arm64/include/uapi/asm/kvm_sdei.h
@@ -59,6 +59,11 @@ struct kvm_sdei_vcpu_state {
 #define KVM_SDEI_CMD_GET_KEVENT_COUNT  2
 #define KVM_SDEI_CMD_GET_KEVENT3
 #define KVM_SDEI_CMD_SET_KEVENT4
+#define KVM_SDEI_CMD_GET_VEVENT_COUNT  5
+#define KVM_SDEI_CMD_GET_VEVENT6
+#define KVM_SDEI_CMD_SET_VEVENT7
+#define KVM_SDEI_CMD_GET_VCPU_STATE8
+#define KVM_SDEI_CMD_SET_VCPU_STATE9
 
 struct kvm_sdei_cmd {
uint32_tcmd;
@@ -68,6 +73,8 @@ struct kvm_sdei_cmd {
uint64_tnum;
struct kvm_sdei_event_state kse_state;
struct kvm_sdei_kvm_event_state kske_state;
+   struct kvm_sdei_vcpu_event_stateksve_state;
+   struct kvm_sdei_vcpu_state  ksv_state;
};
 };
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 96b41bf1d094..55ccd234b0ec 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1260,6 +1260,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
return kvm_arm_vcpu_finalize(vcpu, what);
}
+   case KVM_ARM_SDEI_COMMAND: {
+   return kvm_sdei_vcpu_ioctl(vcpu, arg);
+   }
default:
r = -EINVAL;
}
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index bdd76c3e5153..79315b77f24b 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -35,6 +35,25 @@ static struct kvm_sdei_event *kvm_sdei_find_event(struct kvm 
*kvm,
return NULL;
 }
 
+static struct kvm_sdei_vcpu_event *kvm_sdei_find_vcpu_event(struct kvm_vcpu 
*vcpu,
+   unsigned long num)
+{
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_vcpu_event *ksve;
+
+   list_for_each_entry(ksve, &vsdei->critical_events, link) {
+   if (ksve->state.num == num)
+   return ksve;
+   }
+
+   list_for_each_entry(ksve, &vsdei->normal_events, link) {
+   if (ksve->state.num == num)
+   return ksve;
+   }
+
+   return NULL;
+}
+
 static void kvm_sdei_remove_events(struct kvm *kvm)
 {
struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
@@ -1102,6 +1121,215 @@ long kvm_sdei_vm_ioctl(struct kvm *kvm, unsigned long 
arg)
return ret;
 }
 
+static long kvm_sdei_get_vevent_count(struct kvm_vcpu *vcpu, int *count)
+{
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_vcpu_event *ksve = NULL;
+   int total = 0;
+
+   list_for_each_entry(ksve, &vsdei->critical_events, link) {
+   total++;
+   }
+
+   list_for_each_entry(ksve, &vsdei->normal_events, link) {
+   total++;
+   }
+
+   *count = total;
+   return 0;
+}
+
+static struct kvm_sdei_vcpu_event *next_vcpu_event(struct kvm_vcpu *vcpu,
+  unsigned long num)
+{
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+  

[PATCH v2 16/21] KVM: arm64: Support SDEI ioctl commands on VM

2021-02-08 Thread Gavin Shan
This supports ioctl commands on VM to manage the various objects.
It's primarily used by VMM to accomplish live migration. The ioctl
commands introduced by this are highlighted as blow:

   * KVM_SDEI_CMD_GET_VERSION
 Retrieve the version of current implementation
   * KVM_SDEI_CMD_SET_EVENT
 Add event to be exported from KVM so that guest can register
 against it afterwards
   * KVM_SDEI_CMD_GET_KEVENT_COUNT
 Retrieve number of registered SDEI events
   * KVM_SDEI_CMD_GET_KEVENT
 Retrieve the state of the registered SDEI event
   * KVM_SDEI_CMD_SET_KEVENT
 Populate the registered SDEI event

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_sdei.h  |   1 +
 arch/arm64/include/uapi/asm/kvm_sdei.h |  17 +++
 arch/arm64/kvm/arm.c   |   3 +
 arch/arm64/kvm/sdei.c  | 171 +
 include/uapi/linux/kvm.h   |   3 +
 5 files changed, 195 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_sdei.h 
b/arch/arm64/include/asm/kvm_sdei.h
index 19f2d9b91f85..8f5ea947ed0e 100644
--- a/arch/arm64/include/asm/kvm_sdei.h
+++ b/arch/arm64/include/asm/kvm_sdei.h
@@ -125,6 +125,7 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu);
 int kvm_sdei_register_notifier(struct kvm *kvm, unsigned long num,
   kvm_sdei_notifier notifier);
 void kvm_sdei_deliver(struct kvm_vcpu *vcpu);
+long kvm_sdei_vm_ioctl(struct kvm *kvm, unsigned long arg);
 void kvm_sdei_destroy_vcpu(struct kvm_vcpu *vcpu);
 void kvm_sdei_destroy_vm(struct kvm *kvm);
 
diff --git a/arch/arm64/include/uapi/asm/kvm_sdei.h 
b/arch/arm64/include/uapi/asm/kvm_sdei.h
index 20ad724f63c8..55de8baff841 100644
--- a/arch/arm64/include/uapi/asm/kvm_sdei.h
+++ b/arch/arm64/include/uapi/asm/kvm_sdei.h
@@ -54,4 +54,21 @@ struct kvm_sdei_vcpu_state {
struct kvm_sdei_vcpu_regs   normal_regs;
 };
 
+#define KVM_SDEI_CMD_GET_VERSION   0
+#define KVM_SDEI_CMD_SET_EVENT 1
+#define KVM_SDEI_CMD_GET_KEVENT_COUNT  2
+#define KVM_SDEI_CMD_GET_KEVENT3
+#define KVM_SDEI_CMD_SET_KEVENT4
+
+struct kvm_sdei_cmd {
+   uint32_tcmd;
+   union {
+   uint32_tversion;
+   uint32_tcount;
+   uint64_tnum;
+   struct kvm_sdei_event_state kse_state;
+   struct kvm_sdei_kvm_event_state kske_state;
+   };
+};
+
 #endif /* _UAPI__ASM_KVM_SDEI_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e243bd5ad730..96b41bf1d094 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1334,6 +1334,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
return 0;
}
+   case KVM_ARM_SDEI_COMMAND: {
+   return kvm_sdei_vm_ioctl(kvm, arg);
+   }
default:
return -EINVAL;
}
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 5f7a37dcaa77..bdd76c3e5153 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -931,6 +931,177 @@ void kvm_sdei_create_vcpu(struct kvm_vcpu *vcpu)
vcpu->arch.sdei = vsdei;
 }
 
+static long kvm_sdei_set_event(struct kvm *kvm,
+  struct kvm_sdei_event_state *kse_state)
+{
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+
+   if (!kvm_sdei_is_valid_event_num(kse_state->num))
+   return -EINVAL;
+
+   if (!(kse_state->type == SDEI_EVENT_TYPE_SHARED ||
+ kse_state->type == SDEI_EVENT_TYPE_PRIVATE))
+   return -EINVAL;
+
+   if (!(kse_state->priority == SDEI_EVENT_PRIORITY_NORMAL ||
+ kse_state->priority == SDEI_EVENT_PRIORITY_CRITICAL))
+   return -EINVAL;
+
+   kse = kvm_sdei_find_event(kvm, kse_state->num);
+   if (kse)
+   return -EEXIST;
+
+   kse = kzalloc(sizeof(*kse), GFP_KERNEL);
+   if (!kse)
+   return -ENOMEM;
+
+   kse->state = *kse_state;
+   kse->kvm = kvm;
+   list_add_tail(&kse->link, &ksdei->events);
+
+   return 0;
+}
+
+static long kvm_sdei_get_kevent_count(struct kvm *kvm, int *count)
+{
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   int total = 0;
+
+   list_for_each_entry(kske, &ksdei->kvm_events, link) {
+   total++;
+   }
+
+   *count = total;
+   return 0;
+}
+
+static long kvm_sdei_get_kevent(struct kvm *kvm,
+   struct kvm_sdei_kvm_event_state *kske_state)
+{
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_kvm_event *kske = NULL;
+
+   /*
+* The first entry is fetched if the event number is invalid.
+* Otherwise, the next entry is fetched.
+   

[PATCH v2 15/21] KVM: arm64: Support SDEI event notifier

2021-02-08 Thread Gavin Shan
The owner of the SDEI event, like asynchronous page fault, need
know the state of injected SDEI event. This supports SDEI event
state updating by introducing notifier mechanism. It's notable
the notifier (handler) should be capable of migration.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_sdei.h  | 12 +++
 arch/arm64/include/uapi/asm/kvm_sdei.h |  1 +
 arch/arm64/kvm/sdei.c  | 45 +-
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_sdei.h 
b/arch/arm64/include/asm/kvm_sdei.h
index 7f5f5ad689e6..19f2d9b91f85 100644
--- a/arch/arm64/include/asm/kvm_sdei.h
+++ b/arch/arm64/include/asm/kvm_sdei.h
@@ -16,6 +16,16 @@
 #include 
 #include 
 
+struct kvm_vcpu;
+
+typedef void (*kvm_sdei_notifier)(struct kvm_vcpu *vcpu,
+ unsigned long num,
+ unsigned int state);
+enum {
+   KVM_SDEI_NOTIFY_DELIVERED,
+   KVM_SDEI_NOTIFY_COMPLETED,
+};
+
 struct kvm_sdei_event {
struct kvm_sdei_event_state state;
struct kvm  *kvm;
@@ -112,6 +122,8 @@ KVM_SDEI_FLAG_FUNC(enabled)
 void kvm_sdei_init_vm(struct kvm *kvm);
 void kvm_sdei_create_vcpu(struct kvm_vcpu *vcpu);
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu);
+int kvm_sdei_register_notifier(struct kvm *kvm, unsigned long num,
+  kvm_sdei_notifier notifier);
 void kvm_sdei_deliver(struct kvm_vcpu *vcpu);
 void kvm_sdei_destroy_vcpu(struct kvm_vcpu *vcpu);
 void kvm_sdei_destroy_vm(struct kvm *kvm);
diff --git a/arch/arm64/include/uapi/asm/kvm_sdei.h 
b/arch/arm64/include/uapi/asm/kvm_sdei.h
index 9dbda2fb457e..20ad724f63c8 100644
--- a/arch/arm64/include/uapi/asm/kvm_sdei.h
+++ b/arch/arm64/include/uapi/asm/kvm_sdei.h
@@ -20,6 +20,7 @@ struct kvm_sdei_event_state {
uint8_t type;
uint8_t signaled;
uint8_t priority;
+   uint64_tnotifier;
 };
 
 struct kvm_sdei_kvm_event_state {
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 1e8e213c9d70..5f7a37dcaa77 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -314,9 +314,11 @@ static unsigned long kvm_sdei_hypercall_complete(struct 
kvm_vcpu *vcpu,
struct kvm *kvm = vcpu->kvm;
struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
struct kvm_sdei_kvm_event *kske = NULL;
struct kvm_sdei_vcpu_event *ksve = NULL;
struct kvm_sdei_vcpu_regs *regs;
+   kvm_sdei_notifier notifier;
unsigned long ret = SDEI_SUCCESS;
int index;
 
@@ -349,6 +351,13 @@ static unsigned long kvm_sdei_hypercall_complete(struct 
kvm_vcpu *vcpu,
*vcpu_cpsr(vcpu) = regs->pstate;
*vcpu_pc(vcpu) = regs->pc;
 
+   /* Notifier */
+   kske = ksve->kske;
+   kse = kske->kse;
+   notifier = (kvm_sdei_notifier)(kse->state.notifier);
+   if (notifier)
+   notifier(vcpu, kse->state.num, KVM_SDEI_NOTIFY_COMPLETED);
+
/* Inject interrupt if needed */
if (resume)
kvm_inject_irq(vcpu);
@@ -358,7 +367,6 @@ static unsigned long kvm_sdei_hypercall_complete(struct 
kvm_vcpu *vcpu,
 * event state as it's not destroyed because of the reference
 * count.
 */
-   kske = ksve->kske;
ksve->state.refcount--;
kske->state.refcount--;
if (!ksve->state.refcount) {
@@ -746,6 +754,35 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
return 1;
 }
 
+int kvm_sdei_register_notifier(struct kvm *kvm,
+  unsigned long num,
+  kvm_sdei_notifier notifier)
+{
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   int ret = 0;
+
+   if (!ksdei) {
+   ret = -EPERM;
+   goto out;
+   }
+
+   spin_lock(&ksdei->lock);
+
+   kse = kvm_sdei_find_event(kvm, num);
+   if (!kse) {
+   ret = -EINVAL;
+   goto unlock;
+   }
+
+   kse->state.notifier = (unsigned long)notifier;
+
+unlock:
+   spin_unlock(&ksdei->lock);
+out:
+   return ret;
+}
+
 void kvm_sdei_deliver(struct kvm_vcpu *vcpu)
 {
struct kvm *kvm = vcpu->kvm;
@@ -755,6 +792,7 @@ void kvm_sdei_deliver(struct kvm_vcpu *vcpu)
struct kvm_sdei_kvm_event *kske = NULL;
struct kvm_sdei_vcpu_event *ksve = NULL;
struct kvm_sdei_vcpu_regs *regs = NULL;
+   kvm_sdei_notifier notifier;
unsigned long pstate;
int index = 0;
 
@@ -826,6 +864,11 @@ void kvm_sdei_deliver(struct kvm_vcpu *vcpu)
*vcpu_cpsr(vcpu) = pstate;
*vcpu_pc(vcpu) = kske->state.entries[index];
 
+   /* Notifier */
+   notifier = (kvm_sdei_notifier)(kse->state.notifier);
+   if (notifier)
+   notifier(vcpu

[PATCH v2 14/21] KVM: arm64: Support SDEI_EVENT_{COMPLETE, COMPLETE_AND_RESUME} hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_EVENT_{COMPLETE, COMPLETE_AND_RESUME} hypercall.
They are used by the guest to notify the completion of the SDEI
event in the handler. The registers are changed according to the
SDEI specification as below:

   * x0 - x17, PC and PState are restored to what values we had in
 the interrupted context.

   * If it's SDEI_EVENT_COMPLETE_AND_RESUME hypercall, IRQ exception
 is injected.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_emulate.h |  1 +
 arch/arm64/include/asm/kvm_host.h|  1 +
 arch/arm64/kvm/hyp/exception.c   |  7 +++
 arch/arm64/kvm/inject_fault.c| 27 ++
 arch/arm64/kvm/sdei.c| 75 
 5 files changed, 111 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index f612c090f2e4..0ef213b715a5 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -37,6 +37,7 @@ bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu);
 
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+void kvm_inject_irq(struct kvm_vcpu *vcpu);
 void kvm_inject_vabt(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 30e850257ef4..01eda5c84600 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -416,6 +416,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_EXCEPT_AA32_UND  (0 << 9)
 #define KVM_ARM64_EXCEPT_AA32_IABT (1 << 9)
 #define KVM_ARM64_EXCEPT_AA32_DABT (2 << 9)
+#define KVM_ARM64_EXCEPT_AA32_IRQ  (3 << 9)
 /* For AArch64: */
 #define KVM_ARM64_EXCEPT_AA64_ELx_SYNC (0 << 9)
 #define KVM_ARM64_EXCEPT_AA64_ELx_IRQ  (1 << 9)
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 73629094f903..c1e9bdb67b37 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -309,6 +309,9 @@ void kvm_inject_exception(struct kvm_vcpu *vcpu)
case KVM_ARM64_EXCEPT_AA32_DABT:
enter_exception32(vcpu, PSR_AA32_MODE_ABT, 16);
break;
+   case KVM_ARM64_EXCEPT_AA32_IRQ:
+   enter_exception32(vcpu, PSR_AA32_MODE_IRQ, 4);
+   break;
default:
/* Err... */
break;
@@ -319,6 +322,10 @@ void kvm_inject_exception(struct kvm_vcpu *vcpu)
  KVM_ARM64_EXCEPT_AA64_EL1):
enter_exception64(vcpu, PSR_MODE_EL1h, 
except_type_sync);
break;
+   case (KVM_ARM64_EXCEPT_AA64_ELx_IRQ |
+ KVM_ARM64_EXCEPT_AA64_EL1):
+   enter_exception64(vcpu, PSR_MODE_EL1h, except_type_irq);
+   break;
default:
/*
 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index b47df73e98d7..3a8c55867d2f 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -66,6 +66,13 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
 }
 
+static void inject_irq64(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1 |
+KVM_ARM64_EXCEPT_AA64_ELx_IRQ |
+KVM_ARM64_PENDING_EXCEPTION);
+}
+
 #define DFSR_FSC_EXTABT_LPAE   0x10
 #define DFSR_FSC_EXTABT_nLPAE  0x08
 #define DFSR_LPAE  BIT(9)
@@ -77,6 +84,12 @@ static void inject_undef32(struct kvm_vcpu *vcpu)
 KVM_ARM64_PENDING_EXCEPTION);
 }
 
+static void inject_irq32(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_IRQ |
+KVM_ARM64_PENDING_EXCEPTION);
+}
+
 /*
  * Modelled after TakeDataAbortException() and TakePrefetchAbortException
  * pseudocode.
@@ -160,6 +173,20 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
inject_undef64(vcpu);
 }
 
+/**
+ * kvm_inject_irq - inject an IRQ into the guest
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_irq(struct kvm_vcpu *vcpu)
+{
+   if (vcpu_el1_is_32bit(vcpu))
+   inject_irq32(vcpu);
+   else
+   inject_irq64(vcpu);
+}
+
 void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 esr)
 {
vcpu_set_vsesr(vcpu, esr & ESR_ELx_ISS_MASK);
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index b5d6d1ed3858..1e8e213c9d70 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -308,6 +308,75 @@ static unsign

[PATCH v2 13/21] KVM: arm64: Impment SDEI event delivery

2021-02-08 Thread Gavin Shan
This implement kvm_sdei_deliver() to support SDEI event delivery.
The function is called when the request (KVM_REQ_SDEI) is raised.
The following rules are taken according to the SDEI specification:

   * x0 - x17 are saved. All of them are cleared except the following
 registered:
 x0: number SDEI event to be delivered
 x1: parameter associated with the SDEI event
 x2: PC of the interrupted context
 x3: PState of the interrupted context

   * PC is set to the handler of the SDEI event, which was provided
 during its registration. PState is modified accordingly.

   * SDEI event with critical priority can preempt those with normal
 priority.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/include/asm/kvm_sdei.h |  1 +
 arch/arm64/kvm/arm.c  |  3 ++
 arch/arm64/kvm/sdei.c | 84 +++
 4 files changed, 89 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index b2d51c6d055c..30e850257ef4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -47,6 +47,7 @@
 #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
 #define KVM_REQ_RECORD_STEAL   KVM_ARCH_REQ(3)
 #define KVM_REQ_RELOAD_GICv4   KVM_ARCH_REQ(4)
+#define KVM_REQ_SDEI   KVM_ARCH_REQ(5)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 KVM_DIRTY_LOG_INITIALLY_SET)
diff --git a/arch/arm64/include/asm/kvm_sdei.h 
b/arch/arm64/include/asm/kvm_sdei.h
index b0abc13a0256..7f5f5ad689e6 100644
--- a/arch/arm64/include/asm/kvm_sdei.h
+++ b/arch/arm64/include/asm/kvm_sdei.h
@@ -112,6 +112,7 @@ KVM_SDEI_FLAG_FUNC(enabled)
 void kvm_sdei_init_vm(struct kvm *kvm);
 void kvm_sdei_create_vcpu(struct kvm_vcpu *vcpu);
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu);
+void kvm_sdei_deliver(struct kvm_vcpu *vcpu);
 void kvm_sdei_destroy_vcpu(struct kvm_vcpu *vcpu);
 void kvm_sdei_destroy_vm(struct kvm *kvm);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a7ae16df3df7..e243bd5ad730 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -668,6 +668,9 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
kvm_reset_vcpu(vcpu);
 
+   if (kvm_check_request(KVM_REQ_SDEI, vcpu))
+   kvm_sdei_deliver(vcpu);
+
/*
 * Clear IRQ_PENDING requests that were made to guarantee
 * that a VCPU sees new virtual interrupts.
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 62efee2b67b8..b5d6d1ed3858 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -671,6 +671,90 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
return 1;
 }
 
+void kvm_sdei_deliver(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   struct kvm_sdei_vcpu_event *ksve = NULL;
+   struct kvm_sdei_vcpu_regs *regs = NULL;
+   unsigned long pstate;
+   int index = 0;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei))
+   return;
+
+   /* The critical event can't be preempted */
+   spin_lock(&vsdei->lock);
+   if (vsdei->critical_event)
+   goto unlock;
+
+   /*
+* The normal event can be preempted by the critical event.
+* However, the normal event can't be preempted by another
+* normal event.
+*/
+   ksve = list_first_entry_or_null(&vsdei->critical_events,
+   struct kvm_sdei_vcpu_event, link);
+   if (!ksve && !vsdei->normal_event) {
+   ksve = list_first_entry_or_null(&vsdei->normal_events,
+   struct kvm_sdei_vcpu_event, link);
+   }
+
+   if (!ksve)
+   goto unlock;
+
+   kske = ksve->kske;
+   kse = kske->kse;
+   if (kse->state.priority == SDEI_EVENT_PRIORITY_CRITICAL) {
+   vsdei->critical_event = ksve;
+   vsdei->state.critical_num = ksve->state.num;
+   regs = &vsdei->state.critical_regs;
+   } else {
+   vsdei->normal_event = ksve;
+   vsdei->state.normal_num = ksve->state.num;
+   regs = &vsdei->state.normal_regs;
+   }
+
+   /* Save registers: x0 -> x17, PC, PState */
+   for (index = 0; index < ARRAY_SIZE(regs->regs); index++)
+   regs->regs[index] = vcpu_get_reg(vcpu, index);
+
+   regs->pc = *vcpu_pc(vcpu);
+   regs->pstate = *vcpu_cpsr(vcpu);
+
+   /*
+* Inject SDEI event: x0 -> x3, PC, PState. We needn't take lock
+* for the KVM event as it can't be destroyed because of its
+* ref

[PATCH v2 12/21] KVM: arm64: Support SDEI_{PRIVATE, SHARED}_RESET hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_{PRIVATE, SHARED}_RESET. They are used by the
guest to purge the private or shared SDEI events, which are registered
previously.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 3fb33258b494..62efee2b67b8 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -582,6 +582,29 @@ static unsigned long kvm_sdei_hypercall_mask(struct 
kvm_vcpu *vcpu,
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_reset(struct kvm_vcpu *vcpu,
+ bool private)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   unsigned int mask = private ? (1 << SDEI_EVENT_TYPE_PRIVATE) :
+ (1 << SDEI_EVENT_TYPE_SHARED);
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   spin_lock(&ksdei->lock);
+   kvm_sdei_remove_kvm_events(kvm, mask, false);
+   spin_unlock(&ksdei->lock);
+out:
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -626,8 +649,14 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
break;
case SDEI_1_0_FN_SDEI_INTERRUPT_BIND:
case SDEI_1_0_FN_SDEI_INTERRUPT_RELEASE:
+   ret = SDEI_NOT_SUPPORTED;
+   break;
case SDEI_1_0_FN_SDEI_PRIVATE_RESET:
+   ret = kvm_sdei_hypercall_reset(vcpu, true);
+   break;
case SDEI_1_0_FN_SDEI_SHARED_RESET:
+   ret = kvm_sdei_hypercall_reset(vcpu, false);
+   break;
default:
ret = SDEI_NOT_SUPPORTED;
}
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 11/21] KVM: arm64: Support SDEI_PE_{MASK, UNMASK} hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_PE_{MASK, UNMASK} hypercall. They are used by
the guest to stop the specific vCPU from receiving SDEI events.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 458695c2394f..3fb33258b494 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -551,6 +551,37 @@ static unsigned long kvm_sdei_hypercall_route(struct 
kvm_vcpu *vcpu)
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_mask(struct kvm_vcpu *vcpu,
+bool mask)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   spin_lock(&vsdei->lock);
+
+   /* Check the state */
+   if (mask == vsdei->state.masked) {
+   ret = SDEI_DENIED;
+   goto unlock;
+   }
+
+   /* Update the state */
+   vsdei->state.masked = mask ? 1 : 0;
+
+unlock:
+   spin_unlock(&vsdei->lock);
+out:
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -588,7 +619,11 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
ret = kvm_sdei_hypercall_route(vcpu);
break;
case SDEI_1_0_FN_SDEI_PE_MASK:
+   ret = kvm_sdei_hypercall_mask(vcpu, true);
+   break;
case SDEI_1_0_FN_SDEI_PE_UNMASK:
+   ret = kvm_sdei_hypercall_mask(vcpu, false);
+   break;
case SDEI_1_0_FN_SDEI_INTERRUPT_BIND:
case SDEI_1_0_FN_SDEI_INTERRUPT_RELEASE:
case SDEI_1_0_FN_SDEI_PRIVATE_RESET:
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 10/21] KVM: arm64: Support SDEI_EVENT_ROUTING_SET hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_EVENT_ROUTING_SET hypercall. It's used by the
guest to set route mode and affinity for the registered KVM event.
It's only valid for the shared events. It's not allowed to do so
when the corresponding event has been raised to the guest.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 64 +++
 1 file changed, 64 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 5dfa74b093f1..458695c2394f 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -489,6 +489,68 @@ static unsigned long kvm_sdei_hypercall_info(struct 
kvm_vcpu *vcpu)
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_route(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   unsigned long event_num = smccc_get_arg1(vcpu);
+   unsigned long route_mode = smccc_get_arg2(vcpu);
+   unsigned long route_affinity = smccc_get_arg3(vcpu);
+   int index = 0;
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   if (!kvm_sdei_is_valid_event_num(event_num)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   if (!(route_mode == SDEI_EVENT_REGISTER_RM_ANY ||
+ route_mode == SDEI_EVENT_REGISTER_RM_PE)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   /* Check if the KVM event has been registered */
+   spin_lock(&ksdei->lock);
+   kske = kvm_sdei_find_kvm_event(kvm, event_num);
+   if (!kske) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto unlock;
+   }
+
+   /* Validate KVM event state */
+   kse = kske->kse;
+   if (kse->state.type != SDEI_EVENT_TYPE_SHARED) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto unlock;
+   }
+
+   if (!kvm_sdei_is_registered(kske, index) ||
+   kvm_sdei_is_enabled(kske, index) ||
+   kske->state.refcount) {
+   ret = SDEI_DENIED;
+   goto unlock;
+   }
+
+   /* Update state */
+   kske->state.route_mode = route_mode;
+   kske->state.route_affinity = route_affinity;
+
+unlock:
+   spin_unlock(&ksdei->lock);
+out:
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -523,6 +585,8 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
ret = kvm_sdei_hypercall_info(vcpu);
break;
case SDEI_1_0_FN_SDEI_EVENT_ROUTING_SET:
+   ret = kvm_sdei_hypercall_route(vcpu);
+   break;
case SDEI_1_0_FN_SDEI_PE_MASK:
case SDEI_1_0_FN_SDEI_PE_UNMASK:
case SDEI_1_0_FN_SDEI_INTERRUPT_BIND:
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 09/21] KVM: arm64: Support SDEI_EVENT_GET_INFO hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_EVENT_GET_INFO hypercall. It's used by the guest
to retrieve various information about the supported (exported) events,
including type, signaled, route mode and affinity for the shared
events.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 76 +++
 1 file changed, 76 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index b95b8c4455e1..5dfa74b093f1 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -415,6 +415,80 @@ static unsigned long kvm_sdei_hypercall_status(struct 
kvm_vcpu *vcpu)
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_info(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   unsigned long event_num = smccc_get_arg1(vcpu);
+   unsigned long event_info = smccc_get_arg2(vcpu);
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   if (!kvm_sdei_is_valid_event_num(event_num)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   /*
+* Check if the KVM event exists. The event might have been
+* registered, we need fetch the information from the registered
+* event in that case.
+*/
+   spin_lock(&ksdei->lock);
+   kske = kvm_sdei_find_kvm_event(kvm, event_num);
+   kse = kske ? kske->kse : NULL;
+   if (!kse) {
+   kse = kvm_sdei_find_event(kvm, event_num);
+   if (!kse) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto unlock;
+   }
+   }
+
+   /* Retrieve the requested information */
+   switch (event_info) {
+   case SDEI_EVENT_INFO_EV_TYPE:
+   ret = kse->state.type;
+   break;
+   case SDEI_EVENT_INFO_EV_SIGNALED:
+   ret = kse->state.signaled;
+   break;
+   case SDEI_EVENT_INFO_EV_PRIORITY:
+   ret = kse->state.priority;
+   break;
+   case SDEI_EVENT_INFO_EV_ROUTING_MODE:
+   case SDEI_EVENT_INFO_EV_ROUTING_AFF:
+   if (kse->state.type != SDEI_EVENT_TYPE_SHARED) {
+   ret = SDEI_INVALID_PARAMETERS;
+   break;
+   }
+
+   if (event_info == SDEI_EVENT_INFO_EV_ROUTING_MODE) {
+   ret = kske ? kske->state.route_mode :
+SDEI_EVENT_REGISTER_RM_ANY;
+   } else {
+   ret = kske ? kske->state.route_affinity : 0;
+   }
+
+   break;
+   default:
+   ret = SDEI_INVALID_PARAMETERS;
+   }
+
+unlock:
+   spin_unlock(&ksdei->lock);
+out:
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -446,6 +520,8 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
ret = kvm_sdei_hypercall_status(vcpu);
break;
case SDEI_1_0_FN_SDEI_EVENT_GET_INFO:
+   ret = kvm_sdei_hypercall_info(vcpu);
+   break;
case SDEI_1_0_FN_SDEI_EVENT_ROUTING_SET:
case SDEI_1_0_FN_SDEI_PE_MASK:
case SDEI_1_0_FN_SDEI_PE_UNMASK:
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 08/21] KVM: arm64: Support SDEI_EVENT_STATUS hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_EVENT_STATUS hypercall. It's used by the guest
to retrieve a bitmap to indicate the SDEI event states, including
registration, enablement and delivery state.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index a3ba69dc91cb..b95b8c4455e1 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -367,6 +367,54 @@ static unsigned long kvm_sdei_hypercall_unregister(struct 
kvm_vcpu *vcpu)
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_status(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   unsigned long event_num = smccc_get_arg1(vcpu);
+   int index = 0;
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   if (!kvm_sdei_is_valid_event_num(event_num)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   /*
+* Check if the KVM event exists. None of the flags
+* will be set if it doesn't exist.
+*/
+   spin_lock(&ksdei->lock);
+   kske = kvm_sdei_find_kvm_event(kvm, event_num);
+   if (!kske) {
+   ret = 0;
+   goto unlock;
+   }
+
+   index = (kse->state.type == SDEI_EVENT_TYPE_PRIVATE) ?
+   vcpu->vcpu_idx : 0;
+   if (kvm_sdei_is_registered(kske, index))
+   ret |= (1UL << SDEI_EVENT_STATUS_REGISTERED);
+   if (kvm_sdei_is_enabled(kske, index))
+   ret |= (1UL << SDEI_EVENT_STATUS_ENABLED);
+   if (kske->state.refcount)
+   ret |= (1UL << SDEI_EVENT_STATUS_RUNNING);
+
+unlock:
+   spin_unlock(&ksdei->lock);
+out:
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -395,6 +443,8 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
ret = kvm_sdei_hypercall_unregister(vcpu);
break;
case SDEI_1_0_FN_SDEI_EVENT_STATUS:
+   ret = kvm_sdei_hypercall_status(vcpu);
+   break;
case SDEI_1_0_FN_SDEI_EVENT_GET_INFO:
case SDEI_1_0_FN_SDEI_EVENT_ROUTING_SET:
case SDEI_1_0_FN_SDEI_PE_MASK:
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 07/21] KVM: arm64: Support SDEI_EVENT_UNREGISTER hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_EVENT_UNREGISTER hypercall. It's used by the
guest to unregister SDEI event. The SDEI event won't be raised to
the guest or specific vCPU after it's unregistered successfully.
It's notable the SDEI event is disabled automatically on the guest
or specific vCPU once it's unregistered successfully.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 61 +++
 1 file changed, 61 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index b4162efda470..a3ba69dc91cb 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -308,6 +308,65 @@ static unsigned long kvm_sdei_hypercall_context(struct 
kvm_vcpu *vcpu)
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_unregister(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   unsigned long event_num = smccc_get_arg1(vcpu);
+   int index = 0;
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   if (!kvm_sdei_is_valid_event_num(event_num)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   /* Check if the KVM event exists */
+   spin_lock(&ksdei->lock);
+   kske = kvm_sdei_find_kvm_event(kvm, event_num);
+   if (!kske) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto unlock;
+   }
+
+   /* Check if there is pending events */
+   if (kske->state.refcount) {
+   ret = SDEI_PENDING;
+   goto unlock;
+   }
+
+   /* Check if it has been registered */
+   kse = kske->kse;
+   index = (kse->state.type == SDEI_EVENT_TYPE_PRIVATE) ?
+   vcpu->vcpu_idx : 0;
+   if (!kvm_sdei_is_registered(kske, index)) {
+   ret = SDEI_DENIED;
+   goto unlock;
+   }
+
+   /* The event is disabled when it's unregistered */
+   kvm_sdei_clear_enabled(kske, index);
+   kvm_sdei_clear_registered(kske, index);
+   if (kvm_sdei_empty_registered(kske)) {
+   list_del(&kske->link);
+   kfree(kske);
+   }
+
+unlock:
+   spin_unlock(&ksdei->lock);
+out:
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -333,6 +392,8 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
case SDEI_1_0_FN_SDEI_EVENT_COMPLETE:
case SDEI_1_0_FN_SDEI_EVENT_COMPLETE_AND_RESUME:
case SDEI_1_0_FN_SDEI_EVENT_UNREGISTER:
+   ret = kvm_sdei_hypercall_unregister(vcpu);
+   break;
case SDEI_1_0_FN_SDEI_EVENT_STATUS:
case SDEI_1_0_FN_SDEI_EVENT_GET_INFO:
case SDEI_1_0_FN_SDEI_EVENT_ROUTING_SET:
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 06/21] KVM: arm64: Support SDEI_EVENT_CONTEXT hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_EVENT_CONTEXT hypercall. It's used by the guest
to retrieved the original registers (R0 - R17) in its SDEI event
handler. Those registers can be corrupted during the SDEI event
delivery.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index b022ce0a202b..b4162efda470 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -270,6 +270,44 @@ static unsigned long kvm_sdei_hypercall_enable(struct 
kvm_vcpu *vcpu,
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_context(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_vcpu_regs *regs;
+   unsigned long index = smccc_get_arg1(vcpu);
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   if (index > ARRAY_SIZE(vsdei->state.critical_regs.regs)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   /* Check if the pending event exists */
+   spin_lock(&vsdei->lock);
+   if (!(vsdei->critical_event || vsdei->normal_event)) {
+   ret = SDEI_DENIED;
+   goto unlock;
+   }
+
+   /* Fetch the requested register */
+   regs = vsdei->critical_event ? &vsdei->state.critical_regs :
+  &vsdei->state.normal_regs;
+   ret = regs->regs[index];
+
+unlock:
+   spin_unlock(&vsdei->lock);
+out:
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -290,6 +328,8 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
ret = kvm_sdei_hypercall_enable(vcpu, false);
break;
case SDEI_1_0_FN_SDEI_EVENT_CONTEXT:
+   ret = kvm_sdei_hypercall_context(vcpu);
+   break;
case SDEI_1_0_FN_SDEI_EVENT_COMPLETE:
case SDEI_1_0_FN_SDEI_EVENT_COMPLETE_AND_RESUME:
case SDEI_1_0_FN_SDEI_EVENT_UNREGISTER:
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 05/21] KVM: arm64: Support SDEI_EVENT_{ENABLE, DISABLE} hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_EVENT_{ENABLE, DISABLE} hypercall. After SDEI
event is registered by guest, it won't be delivered to the guest
until it's enabled. On the other hand, the SDEI event won't be
raised to the guest or specific vCPU if it's has been disabled
on the guest or specific vCPU.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 68 +++
 1 file changed, 68 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index d3ea3eee154b..b022ce0a202b 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -206,6 +206,70 @@ static unsigned long kvm_sdei_hypercall_register(struct 
kvm_vcpu *vcpu)
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_enable(struct kvm_vcpu *vcpu,
+  bool enable)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   unsigned long event_num = smccc_get_arg1(vcpu);
+   int index = 0;
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   if (!kvm_sdei_is_valid_event_num(event_num)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   /* Check if the KVM event exists */
+   spin_lock(&ksdei->lock);
+   kske = kvm_sdei_find_kvm_event(kvm, event_num);
+   if (!kske) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto unlock;
+   }
+
+   /* Check if there is pending events */
+   if (kske->state.refcount) {
+   ret = SDEI_PENDING;
+   goto unlock;
+   }
+
+   /* Check if it has been registered */
+   kse = kske->kse;
+   index = (kse->state.type == SDEI_EVENT_TYPE_PRIVATE) ?
+   vcpu->vcpu_idx : 0;
+   if (!kvm_sdei_is_registered(kske, index)) {
+   ret = SDEI_DENIED;
+   goto unlock;
+   }
+
+   /* Verify its enablement state */
+   if (enable == kvm_sdei_is_enabled(kske, index)) {
+   ret = SDEI_DENIED;
+   goto unlock;
+   }
+
+   /* Update enablement state */
+   if (enable)
+   kvm_sdei_set_enabled(kske, index);
+   else
+   kvm_sdei_clear_enabled(kske, index);
+
+unlock:
+   spin_unlock(&ksdei->lock);
+out:
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -220,7 +284,11 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
ret = kvm_sdei_hypercall_register(vcpu);
break;
case SDEI_1_0_FN_SDEI_EVENT_ENABLE:
+   ret = kvm_sdei_hypercall_enable(vcpu, true);
+   break;
case SDEI_1_0_FN_SDEI_EVENT_DISABLE:
+   ret = kvm_sdei_hypercall_enable(vcpu, false);
+   break;
case SDEI_1_0_FN_SDEI_EVENT_CONTEXT:
case SDEI_1_0_FN_SDEI_EVENT_COMPLETE:
case SDEI_1_0_FN_SDEI_EVENT_COMPLETE_AND_RESUME:
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 04/21] KVM: arm64: Support SDEI_EVENT_REGISTER hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_EVENT_REGISTER hypercall, which is used by guest
to register SDEI events. The SDEI event won't be raised to the guest
or specific vCPU until it's registered and enabled explicitly.

Only those events that have been exported by KVM can be registered.
After the event is registered successfully, the KVM SDEI event (object)
is created or updated because the same KVM SDEI event is shared by
multiple vCPUs if it's a private event.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 122 ++
 1 file changed, 122 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index aa9485f076a9..d3ea3eee154b 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -21,6 +21,20 @@ static struct kvm_sdei_event_state defined_kse[] = {
},
 };
 
+static struct kvm_sdei_event *kvm_sdei_find_event(struct kvm *kvm,
+ unsigned long num)
+{
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_event *kse;
+
+   list_for_each_entry(kse, &ksdei->events, link) {
+   if (kse->state.num == num)
+   return kse;
+   }
+
+   return NULL;
+}
+
 static void kvm_sdei_remove_events(struct kvm *kvm)
 {
struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
@@ -32,6 +46,20 @@ static void kvm_sdei_remove_events(struct kvm *kvm)
}
 }
 
+static struct kvm_sdei_kvm_event *kvm_sdei_find_kvm_event(struct kvm *kvm,
+ unsigned long num)
+{
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_kvm_event *kske;
+
+   list_for_each_entry(kske, &ksdei->kvm_events, link) {
+   if (kske->state.num == num)
+   return kske;
+   }
+
+   return NULL;
+}
+
 static void kvm_sdei_remove_kvm_events(struct kvm *kvm,
   unsigned int mask,
   bool force)
@@ -86,6 +114,98 @@ static unsigned long kvm_sdei_hypercall_version(struct 
kvm_vcpu *vcpu)
return ret;
 }
 
+static unsigned long kvm_sdei_hypercall_register(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   struct kvm_sdei_event *kse = NULL;
+   struct kvm_sdei_kvm_event *kske = NULL;
+   unsigned long event_num = smccc_get_arg1(vcpu);
+   unsigned long event_entry = smccc_get_arg2(vcpu);
+   unsigned long event_param = smccc_get_arg3(vcpu);
+   unsigned long route_mode = smccc_get_arg4(vcpu);
+   unsigned long route_affinity = smccc_get_arg5(vcpu);
+   int index = vcpu->vcpu_idx;
+   unsigned long ret = SDEI_SUCCESS;
+
+   /* Sanity check */
+   if (!(ksdei && vsdei)) {
+   ret = SDEI_NOT_SUPPORTED;
+   goto out;
+   }
+
+   if (!kvm_sdei_is_valid_event_num(event_num)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   if (!(route_mode == SDEI_EVENT_REGISTER_RM_ANY ||
+ route_mode == SDEI_EVENT_REGISTER_RM_PE)) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto out;
+   }
+
+   /*
+* The KVM event could have been created if it's a private event.
+* We needn't create a KVM event in this case.
+*/
+   spin_lock(&ksdei->lock);
+   kske = kvm_sdei_find_kvm_event(kvm, event_num);
+   if (kske) {
+   kse = kske->kse;
+   index = (kse->state.type == SDEI_EVENT_TYPE_PRIVATE) ?
+   vcpu->vcpu_idx : 0;
+
+   if (kvm_sdei_is_registered(kske, index)) {
+   ret = SDEI_DENIED;
+   goto unlock;
+   }
+
+   kske->state.route_mode = route_mode;
+   kske->state.route_affinity = route_affinity;
+   kske->state.entries[index] = event_entry;
+   kske->state.params[index]  = event_param;
+   kvm_sdei_set_registered(kske, index);
+   goto unlock;
+   }
+
+   /* Check if the event number has been registered */
+   kse = kvm_sdei_find_event(kvm, event_num);
+   if (!kse) {
+   ret = SDEI_INVALID_PARAMETERS;
+   goto unlock;
+   }
+
+   /* Create KVM event */
+   kske = kzalloc(sizeof(*kske), GFP_KERNEL);
+   if (!kske) {
+   ret = SDEI_OUT_OF_RESOURCE;
+   goto unlock;
+   }
+
+   /* Initialize KVM event state */
+   index = (kse->state.type == SDEI_EVENT_TYPE_PRIVATE) ?
+   vcpu->vcpu_idx : 0;
+   kske->state.num= event_num;
+   kske->state.refcount   = 0;
+   kske->state.route_mode = route_affinity;
+   kske->state.route_affinity = route_affinity;
+   kske->state.entries[index] = event_entry

[PATCH v2 03/21] KVM: arm64: Support SDEI_VERSION hypercall

2021-02-08 Thread Gavin Shan
This supports SDEI_VERSION hypercall by returning v1.0.0 simply
when the functionality is supported on the VM and vCPU.

Signed-off-by: Gavin Shan 
---
 arch/arm64/kvm/sdei.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index ab330b74a965..aa9485f076a9 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -70,6 +70,22 @@ static void kvm_sdei_remove_vcpu_events(struct kvm_vcpu 
*vcpu)
}
 }
 
+static unsigned long kvm_sdei_hypercall_version(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_sdei_kvm *ksdei = kvm->arch.sdei;
+   struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+   unsigned long ret = SDEI_NOT_SUPPORTED;
+
+   if (!(ksdei && vsdei))
+   return ret;
+
+   /* v1.0.0 */
+   ret = (1UL << SDEI_VERSION_MAJOR_SHIFT);
+
+   return ret;
+}
+
 int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 {
u32 func = smccc_get_function(vcpu);
@@ -78,6 +94,8 @@ int kvm_sdei_hypercall(struct kvm_vcpu *vcpu)
 
switch (func) {
case SDEI_1_0_FN_SDEI_VERSION:
+   ret = kvm_sdei_hypercall_version(vcpu);
+   break;
case SDEI_1_0_FN_SDEI_EVENT_REGISTER:
case SDEI_1_0_FN_SDEI_EVENT_ENABLE:
case SDEI_1_0_FN_SDEI_EVENT_DISABLE:
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 02/21] KVM: arm64: Add SDEI virtualization infrastructure

2021-02-08 Thread Gavin Shan
Software Delegated Exception Interface (SDEI) provides a mechanism for
registering and servicing system events. Those system events are high
priority events, which must be serviced immediately. It's going to be
used by Asynchronous Page Fault (APF) to deliver notification from KVM
to guest. It's noted that SDEI is defined by ARM DEN0054A specification.

This introduces SDEI virtualization infrastructure where the SDEI events
are registered and manuplated by the guest through hypercall. The SDEI
event is delivered to one specific vCPU by KVM once it's raised. This
introduces data structures to represent the needed objects to implement
the feature, which is highlighted as below. As those objects could be
migrated between VMs, these data structures are partially exported to
user space.

   * kvm_sdei_event
 SDEI events are exported from KVM so that guest is able to register
 and manuplate.
   * kvm_sdei_kvm_event
 SDEI event that has been registered by guest.
   * kvm_sdei_kvm_vcpu
 SDEI event that has been delivered to the target vCPU.
   * kvm_sdei_kvm
 Place holder of exported and registered SDEI events.
   * kvm_sdei_vcpu
 Auxiliary object to save the preempted context during SDEI event
 delivery.

The error is returned for all SDEI hypercalls for now. They will be
supported by subsequent patches.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/kvm_host.h  |   4 +
 arch/arm64/include/asm/kvm_sdei.h  | 118 +++
 arch/arm64/include/uapi/asm/kvm.h  |   1 +
 arch/arm64/include/uapi/asm/kvm_sdei.h |  56 +++
 arch/arm64/kvm/Makefile|   2 +-
 arch/arm64/kvm/arm.c   |   7 +
 arch/arm64/kvm/hypercalls.c|  18 +++
 arch/arm64/kvm/sdei.c  | 198 +
 8 files changed, 403 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/kvm_sdei.h
 create mode 100644 arch/arm64/include/uapi/asm/kvm_sdei.h
 create mode 100644 arch/arm64/kvm/sdei.c

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 8fcfab0c2567..b2d51c6d055c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -133,6 +133,8 @@ struct kvm_arch {
 
u8 pfr0_csv2;
u8 pfr0_csv3;
+
+   struct kvm_sdei_kvm *sdei;
 };
 
 struct kvm_vcpu_fault_info {
@@ -370,6 +372,8 @@ struct kvm_vcpu_arch {
u64 last_steal;
gpa_t base;
} steal;
+
+   struct kvm_sdei_vcpu *sdei;
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
diff --git a/arch/arm64/include/asm/kvm_sdei.h 
b/arch/arm64/include/asm/kvm_sdei.h
new file mode 100644
index ..b0abc13a0256
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_sdei.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Definitions of various KVM SDEI events.
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Author(s): Gavin Shan 
+ */
+
+#ifndef __ARM64_KVM_SDEI_H__
+#define __ARM64_KVM_SDEI_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct kvm_sdei_event {
+   struct kvm_sdei_event_state state;
+   struct kvm  *kvm;
+   struct list_headlink;
+};
+
+struct kvm_sdei_kvm_event {
+   struct kvm_sdei_kvm_event_state state;
+   struct kvm_sdei_event   *kse;
+   struct kvm  *kvm;
+   struct list_headlink;
+};
+
+struct kvm_sdei_vcpu_event {
+   struct kvm_sdei_vcpu_event_statestate;
+   struct kvm_sdei_kvm_event   *kske;
+   struct kvm_vcpu *vcpu;
+   struct list_headlink;
+};
+
+struct kvm_sdei_kvm {
+   spinlock_t  lock;
+   struct list_headevents; /* kvm_sdei_event */
+   struct list_headkvm_events; /* kvm_sdei_kvm_event */
+};
+
+struct kvm_sdei_vcpu {
+   spinlock_t  lock;
+   struct kvm_sdei_vcpu_state  state;
+   struct kvm_sdei_vcpu_event  *critical_event;
+   struct kvm_sdei_vcpu_event  *normal_event;
+   struct list_headcritical_events;
+   struct list_headnormal_events;
+};
+
+/*
+ * According to SDEI specification (v1.0), the event number spans 32-bits
+ * and the lower 24-bits are used as the (real) event number. I don't
+ * think we can use that much SDEI numbers in one system. So we reserve
+ * two bits from the 24-bits real event number, to indicate its types:
+ * physical event and virtual event. One reserved bit is enough for now,
+ * but two bits are reserved for possible extension in future.
+ *
+ * The physical events are owned by underly firmware while the virtual
+ * events are used by VMM and KVM.
+ */
+#define KVM_SDEI_EV_NUM_TYPE_SHIFT 22
+#define KVM_SDEI_EV_NUM_TYPE_MASK  

[PATCH v2 01/21] KVM: arm64: Introduce template for inline functions

2021-02-08 Thread Gavin Shan
The inline functions used to get the SMCCC parameters have same
layout. It means the logical functionality can be presented by
a template, to make the code simplified. Besides, this adds more
similar inline functions like smccc_get_arg{4,5,6,7,8}() to visit
more SMCCC arguments, which are required by SDEI virtualization
support.

Signed-off-by: Gavin Shan 
---
 include/kvm/arm_hypercalls.h | 34 +++---
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 0e2509d27910..1120eff7aa28 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -6,27 +6,21 @@
 
 #include 
 
-int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
-
-static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
-{
-   return vcpu_get_reg(vcpu, 0);
+#define SMCCC_DECLARE_GET_FUNCTION(type, name, reg)\
+static inline type smccc_get_##name(struct kvm_vcpu *vcpu) \
+{  \
+   return vcpu_get_reg(vcpu, (reg));   \
 }
 
-static inline unsigned long smccc_get_arg1(struct kvm_vcpu *vcpu)
-{
-   return vcpu_get_reg(vcpu, 1);
-}
-
-static inline unsigned long smccc_get_arg2(struct kvm_vcpu *vcpu)
-{
-   return vcpu_get_reg(vcpu, 2);
-}
-
-static inline unsigned long smccc_get_arg3(struct kvm_vcpu *vcpu)
-{
-   return vcpu_get_reg(vcpu, 3);
-}
+SMCCC_DECLARE_GET_FUNCTION(u32,   function, 0)
+SMCCC_DECLARE_GET_FUNCTION(unsigned long, arg1, 1)
+SMCCC_DECLARE_GET_FUNCTION(unsigned long, arg2, 2)
+SMCCC_DECLARE_GET_FUNCTION(unsigned long, arg3, 3)
+SMCCC_DECLARE_GET_FUNCTION(unsigned long, arg4, 4)
+SMCCC_DECLARE_GET_FUNCTION(unsigned long, arg5, 5)
+SMCCC_DECLARE_GET_FUNCTION(unsigned long, arg6, 6)
+SMCCC_DECLARE_GET_FUNCTION(unsigned long, arg7, 7)
+SMCCC_DECLARE_GET_FUNCTION(unsigned long, arg8, 8)
 
 static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
unsigned long a0,
@@ -40,4 +34,6 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
vcpu_set_reg(vcpu, 3, a3);
 }
 
+int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
+
 #endif
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 00/21] Support SDEI Virtualization

2021-02-08 Thread Gavin Shan
This series intends to virtualize Software Delegated Exception Interface
(SDEI), which is defined by DEN0054A. It allows the hypervisor to deliver
NMI-alike event to guest and it's needed by asynchronous page fault to
deliver page-not-present notification from hypervisor to guest. The code
and the required qemu changes can be found from:

   https://github.com/gwshan/linux("sdei")
   https://github.com/gwshan/qemu.git ("apf")

The SDEI event is identified by a 32-bits number. Bits[31:24] are used
to indicate the SDEI event properties while bits[23:0] are identifying
the unique number. The implementation takes bits[23:22] to indicate the
owner of the SDEI event. For example, those SDEI events owned by KVM
should have these two bits set to 0b01. Besides, the implementation
supports SDEI events owned by KVM only.

The design is pretty straightforward and the implementation is just
following the SDEI specification. There are several data structures
introduced. Some of the objects have to be migrated by VMM. So their
definitions are split up so that VMM can include their states for
migration.

   struct kvm_sdei_kvm
  Associated with VM and used to track the KVM exposed SDEI events
  and those registered by guest.
   struct kvm_sdei_vcpu
  Associated with vCPU and used to track SDEI event delivery. The
  preempted context is saved prior to the delivery and restored
  after that.
   struct kvm_sdei_event
  SDEI events exposed by KVM so that guest can register and enable.
   struct kvm_sdei_kvm_event
  SDEI events that have been registered by guest.
   struct kvm_sdei_vcpu_event
  SDEI events that have been queued to specific vCPU for delivery.

The series is organized as below:

   PATCH[01]Introduces template for smccc_get_argx()
   PATCH[02]Introduces the data structures and infrastructure
   PATCH[03-14] Supports various SDEI related hypercalls
   PATCH[15]Supports SDEI event notification
   PATCH[16-17] Introduces ioctl command for migration
   PATCH[18-19] Supports SDEI event injection and cancellation
   PATCH[20]Exports SDEI capability
   PATCH[21]Adds self-test case for SDEI virtualization

Testing
===

There are two additional patches in the following repository to create
procfs files allowing inject SDEI event and driver for the guest to
use the SDEI event. Besides, the additional qemu changes are needed
so that guest can detects the SDEI service through ACPI table.

https://github.com/gwshan/linux("sdei")
https://github.com/gwshan/qemu.git ("apf")

The SDEI event is received and handled in the guest after it's injected
through the procfs files on host.

Changelog
=
v2:
   * Rebased to 5.11.rc6
   * Dropped changes related to SDEI client driver(Gavin)
   * Removed support for passthrou SDEI events(Gavin)
   * Redesigned data structures   (Gavin)
   * Implementation is almost rewritten as the data structures
 are totally changed  (Gavin)
   * Added ioctl commands to support migration(Gavin)

Gavin Shan (21):
  KVM: arm64: Introduce template for inline functions
  KVM: arm64: Add SDEI virtualization infrastructure
  KVM: arm64: Support SDEI_VERSION hypercall
  KVM: arm64: Support SDEI_EVENT_REGISTER hypercall
  KVM: arm64: Support SDEI_EVENT_{ENABLE, DISABLE} hypercall
  KVM: arm64: Support SDEI_EVENT_CONTEXT hypercall
  KVM: arm64: Support SDEI_EVENT_UNREGISTER hypercall
  KVM: arm64: Support SDEI_EVENT_STATUS hypercall
  KVM: arm64: Support SDEI_EVENT_GET_INFO hypercall
  KVM: arm64: Support SDEI_EVENT_ROUTING_SET hypercall
  KVM: arm64: Support SDEI_PE_{MASK, UNMASK} hypercall
  KVM: arm64: Support SDEI_{PRIVATE, SHARED}_RESET hypercall
  KVM: arm64: Impment SDEI event delivery
  KVM: arm64: Support SDEI_EVENT_{COMPLETE, COMPLETE_AND_RESUME}
hypercall
  KVM: arm64: Support SDEI event notifier
  KVM: arm64: Support SDEI ioctl commands on VM
  KVM: arm64: Support SDEI ioctl commands on vCPU
  KVM: arm64: Support SDEI event injection
  KVM: arm64: Support SDEI event cancellation
  KVM: arm64: Export SDEI capability
  KVM: selftests: Add SDEI test case

 arch/arm64/include/asm/kvm_emulate.h   |1 +
 arch/arm64/include/asm/kvm_host.h  |6 +
 arch/arm64/include/asm/kvm_sdei.h  |  136 ++
 arch/arm64/include/uapi/asm/kvm.h  |1 +
 arch/arm64/include/uapi/asm/kvm_sdei.h |   82 ++
 arch/arm64/kvm/Makefile|2 +-
 arch/arm64/kvm/arm.c   |   19 +
 arch/arm64/kvm/hyp/exception.c |7 +
 arch/arm64/kvm/hypercalls.c|   18 +
 arch/arm64/kvm/inject_fault.c  |   27 +
 arch/arm64/kvm/sdei.c  | 1519 
 include/kvm/arm_hypercalls.h   |   34 +-
 include/uapi/linux/kvm.h   |4 +
 tools/testing/selftests/kvm/Makefile 

Re: [RFC PATCH v8 5/5] KVM: arm64: ioctl to fetch/store tags in a guest

2021-02-08 Thread Peter Maydell
On Fri, 5 Feb 2021 at 13:58, Steven Price  wrote:
>
> The VMM may not wish to have it's own mapping of guest memory mapped
> with PROT_MTE because this causes problems if the VMM has tag checking
> enabled (the guest controls the tags in physical RAM and it's unlikely
> the tags are correct for the VMM).
>
> Instead add a new ioctl which allows the VMM to easily read/write the
> tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
> while the VMM can still read/write the tags for the purpose of
> migration.
>
> Signed-off-by: Steven Price 
> ---
>  arch/arm64/include/uapi/asm/kvm.h | 13 +++
>  arch/arm64/kvm/arm.c  | 57 +++
>  include/uapi/linux/kvm.h  |  1 +
>  3 files changed, 71 insertions(+)

Missing the update to the docs in Documentation/virtual/kvm/api.txt :-)

thanks
-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 00/23] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth

2021-02-08 Thread Marc Zyngier

On 2021-02-08 14:32, Will Deacon wrote:

Hi Marc,

On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:

It recently came to light that there is a need to be able to override
some CPU features very early on, before the kernel is fully up and
running. The reasons for this range from specific feature support
(such as using Protected KVM on VHE HW, which is the main motivation
for this work) to errata workaround (a feature is broken on a CPU and
needs to be turned off, or rather not enabled).

This series tries to offer a limited framework for this kind of
problems, by allowing a set of options to be passed on the
command-line and altering the feature set that the cpufeature
subsystem exposes to the rest of the kernel. Note that this doesn't
change anything for code that directly uses the CPU ID registers.


I applied this locally, but I'm seeing consistent boot failure under 
QEMU when
KASAN is enabled. I tried sprinkling some __no_sanitize_address 
annotations

around (see below) but it didn't help. The culprit appears to be
early_fdt_map(), but looking a bit more closely, I'm really nervous 
about the
way we call into C functions from __primary_switched. Remember -- this 
code
runs _twice_ when KASLR is active: before and after the randomization. 
This
also means that any memory writes the first time around can be lost due 
to

the D-cache invalidation when (re-)creating the kernel page-tables.


Nailed it. Of course, before anything starts writing from C code, we 
need

to have initialised KASAN. kasan_init.c itself is compiled without any
address sanitising, but we can't repaint all the stuff that is called
from early_fdt_map() (quite a lot).

So the natural thing to do is to keep kasan_early_init() as the first
thing we do in C code, and everything falls from that.

Any chance you could try that on top and see if that cures your problem?
If that works for you, I'll push an updates series.

Thanks,

M.

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index bce66d6bda74..09a5b603c950 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -429,13 +429,13 @@ SYM_FUNC_START_LOCAL(__primary_switched)
bl  __pi_memset
dsb ishst   // Make zero page visible to PTW

+#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
+   bl  kasan_early_init
+#endif
mov x0, x21 // pass FDT address in x0
bl  early_fdt_map   // Try mapping the FDT early
bl  init_feature_override
bl  switch_to_vhe
-#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
-   bl  kasan_early_init
-#endif
 #ifdef CONFIG_RANDOMIZE_BASE
tst x23, ~(MIN_KIMG_ALIGN - 1)  // already running randomized?
b.ne0f

--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 00/23] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth

2021-02-08 Thread Marc Zyngier

Hi Will,

On 2021-02-08 14:32, Will Deacon wrote:

Hi Marc,

On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:

It recently came to light that there is a need to be able to override
some CPU features very early on, before the kernel is fully up and
running. The reasons for this range from specific feature support
(such as using Protected KVM on VHE HW, which is the main motivation
for this work) to errata workaround (a feature is broken on a CPU and
needs to be turned off, or rather not enabled).

This series tries to offer a limited framework for this kind of
problems, by allowing a set of options to be passed on the
command-line and altering the feature set that the cpufeature
subsystem exposes to the rest of the kernel. Note that this doesn't
change anything for code that directly uses the CPU ID registers.


I applied this locally, but I'm seeing consistent boot failure under 
QEMU when
KASAN is enabled. I tried sprinkling some __no_sanitize_address 
annotations

around (see below) but it didn't help. The culprit appears to be
early_fdt_map(), but looking a bit more closely, I'm really nervous 
about the
way we call into C functions from __primary_switched. Remember -- this 
code
runs _twice_ when KASLR is active: before and after the randomization. 
This
also means that any memory writes the first time around can be lost due 
to

the D-cache invalidation when (re-)creating the kernel page-tables.


Well, we already call into C functions with KASLR, and nothing explodes
with that, so I must be doing something else wrong.

I do have cache maintenance for the writes to the shadow registers, so 
that
part should be fine. But I think I'm missing some cache maintenance 
around
the FDT base itself, and I wonder what happens when we go around the 
loop.


I'll chase this down now.

Thanks for the heads up.

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCHv2] kvm: arm64: Add SVE support for nVHE.

2021-02-08 Thread Dave Martin
On Fri, Feb 05, 2021 at 12:12:51AM +, Daniel Kiss wrote:
> 
> 
> > On 4 Feb 2021, at 18:36, Dave Martin  wrote:
> > 
> > On Tue, Feb 02, 2021 at 07:52:54PM +0100, Daniel Kiss wrote:
> >> CPUs that support SVE are architecturally required to support the
> >> Virtualization Host Extensions (VHE), so far the kernel supported
> >> SVE alongside KVM with VHE enabled. In same cases it is desired to
> >> run nVHE config even when VHE is available.
> >> This patch add support for SVE for nVHE configuration too.
> >> 
> >> Tested on FVP with a Linux guest VM that run with a different VL than
> >> the host system.
> >> 
> >> Signed-off-by: Daniel Kiss 

[...]

> >> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> >> index 3e081d556e81..8f29b468e989 100644
> >> --- a/arch/arm64/kvm/fpsimd.c
> >> +++ b/arch/arm64/kvm/fpsimd.c
> >> @@ -42,6 +42,16 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
> >>if (ret)
> >>goto error;
> >> 
> >> +  if (!has_vhe() && vcpu->arch.sve_state) {
> >> +  void *sve_state_end = vcpu->arch.sve_state +
> >> +  SVE_SIG_REGS_SIZE(
> >> +  
> >> sve_vq_from_vl(vcpu->arch.sve_max_vl));
> >> +  ret = create_hyp_mappings(vcpu->arch.sve_state,
> >> +sve_state_end,
> >> +PAGE_HYP);
> >> +  if (ret)
> >> +  goto error;
> >> +  }
> >>vcpu->arch.host_thread_info = kern_hyp_va(ti);
> >>vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
> >> error:
> >> @@ -109,10 +119,22 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> >>local_irq_save(flags);
> >> 
> >>if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> >> +  if (guest_has_sve) {
> >> +  if (has_vhe())
> >> +  __vcpu_sys_reg(vcpu, ZCR_EL1) = 
> >> read_sysreg_s(SYS_ZCR_EL12);
> >> +  else {
> >> +  __vcpu_sys_reg(vcpu, ZCR_EL1) = 
> >> read_sysreg_s(SYS_ZCR_EL1);
> >> +  /*
> >> +   * vcpu could set ZCR_EL1 to a shorter VL then 
> >> the max VL but
> >> +   * the context is still valid there. Save the 
> >> whole context.
> >> +   * In nVHE case we need to reset the ZCR_EL1 
> >> for that
> >> +   * because the save will be done in EL1.
> >> +   */
> >> +  
> >> write_sysreg_s(sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1,
> >> + SYS_ZCR_EL1);
> > 
> > This still doesn't feel right.  We're already in EL1 here I think, in
> > which case ZCR_EL1 has an immediate effect on what state the
> > architecture guarantees to preserve: if we need to change ZCR_EL1, it's
> > because it might be wrong.  If it's wrong, it might be too small.  And
> > if it's too small, we may have already lost some SVE register bits that
> > the guest cares about.
> "On taking an exception from an Exception level that is more constrained
>  to a target Exception level that is less constrained, or on writing a larger 
> value
>  to ZCR_ELx.LEN, then the previously inaccessible bits of these registers 
> that 
>  become accessible have a value of either zero or the value they had before
>  executing at the more constrained size.”
> If the CPU zeros the register when ZCR is written or exception is taken my 
> reading
>  of the above is that the register content maybe lost when we land in EL2.
> No code shall not count on the register content after reduces the VL in ZCR.
> 
> I see my comment also not clear enough.
> Maybe we shouldn’t save the guest’s sve_max_vl here, would enough to save up 
> to
> the actual VL.

Maybe you're right, but I may be missing some information here.

Can you sketch out more explicitly how it works, showing how all the
bits the host cares about (and only those bits) are saved/restored for
the host, and how all the bits the guest cares about (and only those
bits) are saved/restored for the guest?


Various optimisations are possible, but there is a risk of breaking
assumptions elsewhere.  For example, the KVM_{SET,GET}_ONE_REG code
makes assmuptions about the layout of the data in
vcpu->arch.sve_state.

The architectural rules about when SVE register bits are also complex,
with many interactions.  We also don't want to aggressively optimise in
a way that might be hard to apply to nested virt.


My instinct is to keep it simple while this patch matures, and continue
to save/restore based on vcpu->arch.sve_max_vl.  This keeps a clear
split between the emulated "hardware" (which doesn't change while the
guest runs), and changeable runtime state (like the guest's ZCR_EL1).

I'm happy to review proposed optimisations, but I think those should be
separated out as later patches (or a separate series).  My experience
is that it's much 

Re: [PATCH v7 00/23] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth

2021-02-08 Thread Ard Biesheuvel
On Mon, 8 Feb 2021 at 15:32, Will Deacon  wrote:
>
> Hi Marc,
>
> On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:
> > It recently came to light that there is a need to be able to override
> > some CPU features very early on, before the kernel is fully up and
> > running. The reasons for this range from specific feature support
> > (such as using Protected KVM on VHE HW, which is the main motivation
> > for this work) to errata workaround (a feature is broken on a CPU and
> > needs to be turned off, or rather not enabled).
> >
> > This series tries to offer a limited framework for this kind of
> > problems, by allowing a set of options to be passed on the
> > command-line and altering the feature set that the cpufeature
> > subsystem exposes to the rest of the kernel. Note that this doesn't
> > change anything for code that directly uses the CPU ID registers.
>
> I applied this locally, but I'm seeing consistent boot failure under QEMU when
> KASAN is enabled. I tried sprinkling some __no_sanitize_address annotations
> around (see below) but it didn't help. The culprit appears to be
> early_fdt_map(), but looking a bit more closely, I'm really nervous about the
> way we call into C functions from __primary_switched. Remember -- this code
> runs _twice_ when KASLR is active: before and after the randomization. This
> also means that any memory writes the first time around can be lost due to
> the D-cache invalidation when (re-)creating the kernel page-tables.
>

Not just cache invalidation - BSS gets wiped again as well.

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 00/23] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth

2021-02-08 Thread Will Deacon
Hi Marc,

On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:
> It recently came to light that there is a need to be able to override
> some CPU features very early on, before the kernel is fully up and
> running. The reasons for this range from specific feature support
> (such as using Protected KVM on VHE HW, which is the main motivation
> for this work) to errata workaround (a feature is broken on a CPU and
> needs to be turned off, or rather not enabled).
> 
> This series tries to offer a limited framework for this kind of
> problems, by allowing a set of options to be passed on the
> command-line and altering the feature set that the cpufeature
> subsystem exposes to the rest of the kernel. Note that this doesn't
> change anything for code that directly uses the CPU ID registers.

I applied this locally, but I'm seeing consistent boot failure under QEMU when
KASAN is enabled. I tried sprinkling some __no_sanitize_address annotations
around (see below) but it didn't help. The culprit appears to be
early_fdt_map(), but looking a bit more closely, I'm really nervous about the
way we call into C functions from __primary_switched. Remember -- this code
runs _twice_ when KASLR is active: before and after the randomization. This
also means that any memory writes the first time around can be lost due to
the D-cache invalidation when (re-)creating the kernel page-tables.

Will

--->8

diff --git a/arch/arm64/kernel/idreg-override.c 
b/arch/arm64/kernel/idreg-override.c
index dffb16682330..751ed55261b5 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -195,7 +195,7 @@ static __init void parse_cmdline(void)
 /* Keep checkers quiet */
 void init_feature_override(void);
 
-asmlinkage void __init init_feature_override(void)
+asmlinkage void __init __no_sanitize_address init_feature_override(void)
 {
int i;
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 61845c0821d9..33581de05d2e 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -170,12 +170,12 @@ static void __init smp_build_mpidr_hash(void)
 
 static void *early_fdt_ptr __initdata;
 
-void __init *get_early_fdt_ptr(void)
+void __init __no_sanitize_address *get_early_fdt_ptr(void)
 {
return early_fdt_ptr;
 }
 
-asmlinkage void __init early_fdt_map(u64 dt_phys)
+asmlinkage void __init __no_sanitize_address early_fdt_map(u64 dt_phys)
 {
int fdt_size;
 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v18 6/7] KVM: arm64: Add support for the KVM PTP service

2021-02-08 Thread Marc Zyngier
From: Jianyong Wu 

Implement the hypervisor side of the KVM PTP interface.

The service offers wall time and cycle count from host to guest.
The caller must specify whether they want the host's view of
either the virtual or physical counter.

Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong...@arm.com
---
 Documentation/virt/kvm/api.rst |  9 +
 Documentation/virt/kvm/arm/index.rst   |  1 +
 Documentation/virt/kvm/arm/ptp_kvm.rst | 25 
 arch/arm64/kvm/arm.c   |  1 +
 arch/arm64/kvm/hypercalls.c| 53 ++
 include/linux/arm-smccc.h  | 16 
 include/uapi/linux/kvm.h   |  1 +
 7 files changed, 106 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/ptp_kvm.rst

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c136e254b496..7123bedd4248 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6501,3 +6501,12 @@ KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After 
enabling
 KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
 machine will switch to ring-buffer dirty page tracking and further
 KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
+
+8.30 KVM_CAP_PTP_KVM
+
+
+:Architectures: arm64
+
+This capability indicates that the KVM virtual PTP service is
+supported in the host. A VMM can check whether the service is
+available to the guest on migration.
diff --git a/Documentation/virt/kvm/arm/index.rst 
b/Documentation/virt/kvm/arm/index.rst
index 3e2b2aba90fc..78a9b670aafe 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -10,3 +10,4 @@ ARM
hyp-abi
psci
pvtime
+   ptp_kvm
diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst 
b/Documentation/virt/kvm/arm/ptp_kvm.rst
new file mode 100644
index ..68cffb50d8bf
--- /dev/null
+++ b/Documentation/virt/kvm/arm/ptp_kvm.rst
@@ -0,0 +1,25 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+PTP_KVM support for arm/arm64
+=
+
+PTP_KVM is used for high precision time sync between host and guests.
+It relies on transferring the wall clock and counter value from the
+host to the guest using a KVM-specific hypercall.
+
+* ARM_SMCCC_HYP_KVM_PTP_FUNC_ID: 0x8601
+
+This hypercall uses the SMC32/HVC32 calling convention:
+
+ARM_SMCCC_HYP_KVM_PTP_FUNC_ID
+=====
+Function ID: (uint32)  0x8601
+Arguments:   (uint32)  KVM_PTP_VIRT_COUNTER(0)
+   KVM_PTP_PHYS_COUNTER(1)
+Return Values:   (int32)   NOT_SUPPORTED(-1) on error, or
+ (uint32)  Upper 32 bits of wall clock time (r0)
+ (uint32)  Lower 32 bits of wall clock time (r1)
+ (uint32)  Upper 32 bits of counter (r2)
+ (uint32)  Lower 32 bits of counter (r3)
+Endianness:No Restrictions.
+=====
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 04c44853b103..7ce851fc5643 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -206,6 +206,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_INJECT_EXT_DABT:
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
+   case KVM_CAP_PTP_KVM:
r = 1;
break;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index b9d8607083eb..71812879f503 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -9,6 +9,55 @@
 #include 
 #include 
 
+static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
+{
+   struct system_time_snapshot systime_snapshot;
+   u64 cycles = ~0UL;
+   u32 feature;
+
+   /*
+* system time and counter value must captured at the same
+* time to keep consistency and precision.
+*/
+   ktime_get_snapshot(&systime_snapshot);
+
+   /*
+* This is only valid if the current clocksource is the
+* architected counter, as this is the only one the guest
+* can see.
+*/
+   if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
+   return;
+
+   /*
+* The guest selects one of the two reference counters
+* (virtual or physical) with the first argument of the SMCCC
+* call. In case the identifier is not supported, error out.
+*/
+   feature = smccc_get_arg1(vcpu);
+   switch (feature) {
+   case KVM_PTP_VIRT_COUNTER:
+   cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu, 
CNTVOFF_EL2);
+   break;
+   case KVM_PTP_PHYS_COUNTER:
+   cycles = systime_snapshot.cycles;
+  

[PATCH v18 7/7] ptp: arm/arm64: Enable ptp_kvm for arm/arm64

2021-02-08 Thread Marc Zyngier
From: Jianyong Wu 

Currently, there is no mechanism to keep time sync between guest and host
in arm/arm64 virtualization environment. Time in guest will drift compared
with host after boot up as they may both use third party time sources
to correct their time respectively. The time deviation will be in order
of milliseconds. But in some scenarios,like in cloud environment, we ask
for higher time precision.

kvm ptp clock, which chooses the host clock source as a reference
clock to sync time between guest and host, has been adopted by x86
which takes the time sync order from milliseconds to nanoseconds.

This patch enables kvm ptp clock for arm/arm64 and improves clock sync precision
significantly.

Test result comparisons between with kvm ptp clock and without it in arm/arm64
are as follows. This test derived from the result of command 'chronyc
sources'. we should take more care of the last sample column which shows
the offset between the local clock and the source at the last measurement.

no kvm ptp in guest:
MS Name/IP address   Stratum Poll Reach LastRx Last sample

^* dns1.synet.edu.cn  2   6   37713  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37721  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37729  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37737  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37745  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37753  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37761  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   377 4   -130us[ +796us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37712   -130us[ +796us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37720   -130us[ +796us] +/-   21ms

in host:
MS Name/IP address   Stratum Poll Reach LastRx Last sample

^* 120.25.115.20  2   7   37772   -470us[ -603us] +/-   18ms
^* 120.25.115.20  2   7   37792   -470us[ -603us] +/-   18ms
^* 120.25.115.20  2   7   377   112   -470us[ -603us] +/-   18ms
^* 120.25.115.20  2   7   377 2   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   37722   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   37743   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   37763   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   37783   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   377   103   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   377   123   +872ns[-6808ns] +/-   17ms

The dns1.synet.edu.cn is the network reference clock for guest and
120.25.115.20 is the network reference clock for host. we can't get the
clock error between guest and host directly, but a roughly estimated value
will be in order of hundreds of us to ms.

with kvm ptp in guest:
chrony has been disabled in host to remove the disturb by network clock.

MS Name/IP address Stratum Poll Reach LastRx Last sample

* PHC00   3   377 8 -7ns[   +1ns] +/-3ns
* PHC00   3   377 8 +1ns[  +16ns] +/-3ns
* PHC00   3   377 6 -4ns[   -0ns] +/-6ns
* PHC00   3   377 6 -8ns[  -12ns] +/-5ns
* PHC00   3   377 5 +2ns[   +4ns] +/-4ns
* PHC00   3   37713 +2ns[   +4ns] +/-4ns
* PHC00   3   37712 -4ns[   -6ns] +/-4ns
* PHC00   3   37711 -8ns[  -11ns] +/-6ns
* PHC00   3   37710-14ns[  -20ns] +/-4ns
* PHC00   3   377 8 +4ns[   +5ns] +/-4ns

The PHC0 is the ptp clock which choose the host clock as its source
clock. So we can see that the clock difference between host and guest
is in order of ns.

Cc: Mark Rutland 
Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-8-jianyong...@arm.com
---
 drivers/clocksource/arm_arch_timer.c | 34 
 drivers/ptp/Kconfig  |  2 +-
 drivers/ptp/Makefile |  1 +
 drivers/ptp/ptp_kvm_arm.c| 28 +++
 4 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ptp/ptp_kvm_arm.c

diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index 8f12e223703f..e0f167e5e792 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1659,3 +1661,35 @@ static int __init arch_timer_acpi_ini

[PATCH v18 5/7] clocksource: Add clocksource id for arm arch counter

2021-02-08 Thread Marc Zyngier
From: Jianyong Wu 

Add clocksource id to the ARM generic counter so that it can be easily
identified from callers such as ptp_kvm.

Cc: Mark Rutland 
Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-6-jianyong...@arm.com
---
 drivers/clocksource/arm_arch_timer.c | 2 ++
 include/linux/clocksource_ids.h  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index d0177824c518..8f12e223703f 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -191,6 +192,7 @@ static u64 arch_counter_read_cc(const struct cyclecounter 
*cc)
 
 static struct clocksource clocksource_counter = {
.name   = "arch_sys_counter",
+   .id = CSID_ARM_ARCH_COUNTER,
.rating = 400,
.read   = arch_counter_read,
.mask   = CLOCKSOURCE_MASK(56),
diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
index 4d8e19e05328..16775d7d8f8d 100644
--- a/include/linux/clocksource_ids.h
+++ b/include/linux/clocksource_ids.h
@@ -5,6 +5,7 @@
 /* Enum to give clocksources a unique identifier */
 enum clocksource_ids {
CSID_GENERIC= 0,
+   CSID_ARM_ARCH_COUNTER,
CSID_MAX,
 };
 
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v18 4/7] time: Add mechanism to recognize clocksource in time_get_snapshot

2021-02-08 Thread Marc Zyngier
From: Thomas Gleixner 

System time snapshots are not conveying information about the current
clocksource which was used, but callers like the PTP KVM guest
implementation have the requirement to evaluate the clocksource type to
select the appropriate mechanism.

Introduce a clocksource id field in struct clocksource which is by default
set to CSID_GENERIC (0). Clocksource implementations can set that field to
a value which allows to identify the clocksource.

Store the clocksource id of the current clocksource in the
system_time_snapshot so callers can evaluate which clocksource was used to
take the snapshot and act accordingly.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-5-jianyong...@arm.com
---
 include/linux/clocksource.h |  6 ++
 include/linux/clocksource_ids.h | 11 +++
 include/linux/timekeeping.h | 12 +++-
 kernel/time/clocksource.c   |  2 ++
 kernel/time/timekeeping.c   |  1 +
 5 files changed, 27 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/clocksource_ids.h

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143db6523..1290d0dce840 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -62,6 +63,10 @@ struct module;
  * 400-499: Perfect
  * The ideal clocksource. A must-use where
  * available.
+ * @id:Defaults to CSID_GENERIC. The id value is 
captured
+ * in certain snapshot functions to allow callers to
+ * validate the clocksource from which the snapshot was
+ * taken.
  * @flags: Flags describing special properties
  * @enable:Optional function to enable the clocksource
  * @disable:   Optional function to disable the clocksource
@@ -100,6 +105,7 @@ struct clocksource {
const char  *name;
struct list_headlist;
int rating;
+   enum clocksource_idsid;
enum vdso_clock_modevdso_clock_mode;
unsigned long   flags;
 
diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
new file mode 100644
index ..4d8e19e05328
--- /dev/null
+++ b/include/linux/clocksource_ids.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CLOCKSOURCE_IDS_H
+#define _LINUX_CLOCKSOURCE_IDS_H
+
+/* Enum to give clocksources a unique identifier */
+enum clocksource_ids {
+   CSID_GENERIC= 0,
+   CSID_MAX,
+};
+
+#endif
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index c6792cf01bc7..78a98bdff76d 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -3,6 +3,7 @@
 #define _LINUX_TIMEKEEPING_H
 
 #include 
+#include 
 
 /* Included from linux/ktime.h */
 
@@ -243,11 +244,12 @@ struct ktime_timestamps {
  * @cs_was_changed_seq:The sequence number of clocksource change events
  */
 struct system_time_snapshot {
-   u64 cycles;
-   ktime_t real;
-   ktime_t raw;
-   unsigned intclock_was_set_seq;
-   u8  cs_was_changed_seq;
+   u64 cycles;
+   ktime_t real;
+   ktime_t raw;
+   enum clocksource_idscs_id;
+   unsigned intclock_was_set_seq;
+   u8  cs_was_changed_seq;
 };
 
 /**
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a2cc7c..4fe1df894ee5 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -920,6 +920,8 @@ int __clocksource_register_scale(struct clocksource *cs, 
u32 scale, u32 freq)
 
clocksource_arch_init(cs);
 
+   if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
+   cs->id = CSID_GENERIC;
if (cs->vdso_clock_mode < 0 ||
cs->vdso_clock_mode >= VDSO_CLOCKMODE_MAX) {
pr_warn("clocksource %s registered with invalid VDSO mode %d. 
Disabling VDSO support.\n",
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a45cedda93a7..50f08632165c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1049,6 +1049,7 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
do {
seq = read_seqcount_begin(&tk_core.seq);
now = tk_clock_read(&tk->tkr_mono);
+   systime_snapshot->cs_id = tk->tkr_mono.clock->id;
systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
base_real = ktime_add(tk->tkr_mono.base,
-- 
2.29.2


[PATCH v18 3/7] ptp: Reorganize ptp_kvm.c to make it arch-independent

2021-02-08 Thread Marc Zyngier
From: Jianyong Wu 

Currently, the ptp_kvm module contains a lot of x86-specific code.
Let's move this code into a new arch-specific file in the same directory,
and rename the arch-independent file to ptp_kvm_common.c.

Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-4-jianyong...@arm.com
---
 drivers/ptp/Makefile|  1 +
 drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 84 +-
 drivers/ptp/ptp_kvm_x86.c   | 97 +
 include/linux/ptp_kvm.h | 19 
 4 files changed, 139 insertions(+), 62 deletions(-)
 rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (60%)
 create mode 100644 drivers/ptp/ptp_kvm_x86.c
 create mode 100644 include/linux/ptp_kvm.h

diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index db5aef3bddc6..d11eeb5811d1 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -4,6 +4,7 @@
 #
 
 ptp-y  := ptp_clock.o ptp_chardev.o ptp_sysfs.o
+ptp_kvm-$(CONFIG_X86)  := ptp_kvm_x86.o ptp_kvm_common.o
 obj-$(CONFIG_PTP_1588_CLOCK)   += ptp.o
 obj-$(CONFIG_PTP_1588_CLOCK_DTE)   += ptp_dte.o
 obj-$(CONFIG_PTP_1588_CLOCK_INES)  += ptp_ines.o
diff --git a/drivers/ptp/ptp_kvm.c b/drivers/ptp/ptp_kvm_common.c
similarity index 60%
rename from drivers/ptp/ptp_kvm.c
rename to drivers/ptp/ptp_kvm_common.c
index 658d33fc3195..721ddcede5e1 100644
--- a/drivers/ptp/ptp_kvm.c
+++ b/drivers/ptp/ptp_kvm_common.c
@@ -8,11 +8,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
-#include 
-#include 
 #include 
 
 #include 
@@ -24,56 +24,29 @@ struct kvm_ptp_clock {
 
 static DEFINE_SPINLOCK(kvm_ptp_lock);
 
-static struct pvclock_vsyscall_time_info *hv_clock;
-
-static struct kvm_clock_pairing clock_pair;
-static phys_addr_t clock_pair_gpa;
-
 static int ptp_kvm_get_time_fn(ktime_t *device_time,
   struct system_counterval_t *system_counter,
   void *ctx)
 {
-   unsigned long ret;
+   long ret;
+   u64 cycle;
struct timespec64 tspec;
-   unsigned version;
-   int cpu;
-   struct pvclock_vcpu_time_info *src;
+   struct clocksource *cs;
 
spin_lock(&kvm_ptp_lock);
 
preempt_disable_notrace();
-   cpu = smp_processor_id();
-   src = &hv_clock[cpu].pvti;
-
-   do {
-   /*
-* We are using a TSC value read in the hosts
-* kvm_hc_clock_pairing handling.
-* So any changes to tsc_to_system_mul
-* and tsc_shift or any other pvclock
-* data invalidate that measurement.
-*/
-   version = pvclock_read_begin(src);
-
-   ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
-clock_pair_gpa,
-KVM_CLOCK_PAIRING_WALLCLOCK);
-   if (ret != 0) {
-   pr_err_ratelimited("clock pairing hypercall ret %lu\n", 
ret);
-   spin_unlock(&kvm_ptp_lock);
-   preempt_enable_notrace();
-   return -EOPNOTSUPP;
-   }
-
-   tspec.tv_sec = clock_pair.sec;
-   tspec.tv_nsec = clock_pair.nsec;
-   ret = __pvclock_read_cycles(src, clock_pair.tsc);
-   } while (pvclock_read_retry(src, version));
+   ret = kvm_arch_ptp_get_crosststamp(&cycle, &tspec, &cs);
+   if (ret) {
+   spin_unlock(&kvm_ptp_lock);
+   preempt_enable_notrace();
+   return ret;
+   }
 
preempt_enable_notrace();
 
-   system_counter->cycles = ret;
-   system_counter->cs = &kvm_clock;
+   system_counter->cycles = cycle;
+   system_counter->cs = cs;
 
*device_time = timespec64_to_ktime(tspec);
 
@@ -111,22 +84,17 @@ static int ptp_kvm_settime(struct ptp_clock_info *ptp,
 
 static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
 {
-   unsigned long ret;
+   long ret;
struct timespec64 tspec;
 
spin_lock(&kvm_ptp_lock);
 
-   ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
-clock_pair_gpa,
-KVM_CLOCK_PAIRING_WALLCLOCK);
-   if (ret != 0) {
-   pr_err_ratelimited("clock offset hypercall ret %lu\n", ret);
+   ret = kvm_arch_ptp_get_clock(&tspec);
+   if (ret) {
spin_unlock(&kvm_ptp_lock);
-   return -EOPNOTSUPP;
+   return ret;
}
 
-   tspec.tv_sec = clock_pair.sec;
-   tspec.tv_nsec = clock_pair.nsec;
spin_unlock(&kvm_ptp_lock);
 
memcpy(ts, &tspec, sizeof(struct timespec64));
@@ -168,19 +136,11 @@ static int __init ptp_kvm_init(void)
 {
long ret;
 
-   if (!kvm_para_available())
-   ret

[PATCH v18 2/7] KVM: arm64: Advertise KVM UID to guests via SMCCC

2021-02-08 Thread Marc Zyngier
From: Will Deacon 

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-3-jianyong...@arm.com
---
 arch/arm64/kvm/hypercalls.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 25ea4ecb6449..b9d8607083eb 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
-   long val = SMCCC_RET_NOT_SUPPORTED;
+   u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
 
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
-   val = ARM_SMCCC_VERSION_1_1;
+   val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case SPECTRE_VULNERABLE:
break;
case SPECTRE_MITIGATED:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
+   val[0] = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED;
break;
}
break;
@@ -54,27 +54,36 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
fallthrough;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = SMCCC_RET_SUCCESS;
+   val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   val = kvm_hypercall_pv_features(vcpu);
+   val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
-   val = gpa;
+   val[0] = gpa;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+   val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+   val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+   val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+   val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+   break;
+   case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+   val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
 
-   smccc_set_retval(vcpu, val, 0, 0, 0);
+   smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
 }
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v18 0/7] KVM: arm64: Add host/guest KVM-PTP support

2021-02-08 Thread Marc Zyngier
Given that this series[0] has languished in my Inbox for the best of the
past two years, and in an effort to eventually get it merged, I've
taken the liberty to pick it up and do the changes I wanted to see
instead of waiting to go through yet another round.

All the patches have a link to their original counterpart (though I
have squashed a couple of them where it made sense). Tested both 64
and 32bit guests for a good measure. Of course, I claim full
responsibility for any bug introduced here.

* From v17 [1]:
  - Fixed compilation issue on 32bit systems not selecting
CONFIG_HAVE_ARM_SMCCC_DISCOVERY
  - Fixed KVM service discovery not properly parsing the reply
from the hypervisor

* From v16 [0]:
  - Moved the KVM service discovery to its own file, plugged it into
PSCI instead of the arch code, dropped the inlining, made use of
asm/hypervisor.h.
  - Tidied-up the namespacing
  - Cleanup the hypercall handler
  - De-duplicate the guest code
  - Tidied-up arm64-specific documentation
  - Dropped the generic PTP documentation as it needs a new location,
and some cleanup
  - Squashed hypercall documentation and capability into the
main KVM patch
  - Rebased on top of 5.11-rc4

[0] https://lore.kernel.org/r/20201209060932.212364-1-jianyong...@arm.com
[1] https://lore.kernel.org/r/20210202141204.3134855-1-...@kernel.org

Jianyong Wu (4):
  ptp: Reorganize ptp_kvm.c to make it arch-independent
  clocksource: Add clocksource id for arm arch counter
  KVM: arm64: Add support for the KVM PTP service
  ptp: arm/arm64: Enable ptp_kvm for arm/arm64

Thomas Gleixner (1):
  time: Add mechanism to recognize clocksource in time_get_snapshot

Will Deacon (2):
  arm/arm64: Probe for the presence of KVM hypervisor
  KVM: arm64: Advertise KVM UID to guests via SMCCC

 Documentation/virt/kvm/api.rst  |  9 ++
 Documentation/virt/kvm/arm/index.rst|  1 +
 Documentation/virt/kvm/arm/ptp_kvm.rst  | 25 ++
 arch/arm/include/asm/hypervisor.h   |  3 +
 arch/arm64/include/asm/hypervisor.h |  3 +
 arch/arm64/kvm/arm.c|  1 +
 arch/arm64/kvm/hypercalls.c | 80 +++--
 drivers/clocksource/arm_arch_timer.c| 36 
 drivers/firmware/psci/psci.c|  2 +
 drivers/firmware/smccc/Makefile |  2 +-
 drivers/firmware/smccc/kvm_guest.c  | 50 +++
 drivers/firmware/smccc/smccc.c  |  1 +
 drivers/ptp/Kconfig |  2 +-
 drivers/ptp/Makefile|  2 +
 drivers/ptp/ptp_kvm_arm.c   | 28 ++
 drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 84 +-
 drivers/ptp/ptp_kvm_x86.c   | 97 +
 include/linux/arm-smccc.h   | 41 +
 include/linux/clocksource.h |  6 ++
 include/linux/clocksource_ids.h | 12 +++
 include/linux/ptp_kvm.h | 19 
 include/linux/timekeeping.h | 12 +--
 include/uapi/linux/kvm.h|  1 +
 kernel/time/clocksource.c   |  2 +
 kernel/time/timekeeping.c   |  1 +
 25 files changed, 442 insertions(+), 78 deletions(-)
 create mode 100644 Documentation/virt/kvm/arm/ptp_kvm.rst
 create mode 100644 drivers/firmware/smccc/kvm_guest.c
 create mode 100644 drivers/ptp/ptp_kvm_arm.c
 rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (60%)
 create mode 100644 drivers/ptp/ptp_kvm_x86.c
 create mode 100644 include/linux/clocksource_ids.h
 create mode 100644 include/linux/ptp_kvm.h

-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v18 1/7] arm/arm64: Probe for the presence of KVM hypervisor

2021-02-08 Thread Marc Zyngier
From: Will Deacon 

Although the SMCCC specification provides some limited functionality for
describing the presence of hypervisor and firmware services, this is
generally applicable only to functions designated as "Arm Architecture
Service Functions" and no portable discovery mechanism is provided for
standard hypervisor services, despite having a designated range of
function identifiers reserved by the specification.

In an attempt to avoid the need for additional firmware changes every
time a new function is added, introduce a UID to identify the service
provider as being compatible with KVM. Once this has been established,
additional services can be discovered via a feature bitmap.

Signed-off-by: Will Deacon 
Signed-off-by: Jianyong Wu 
[maz: move code to its own file, plug it into PSCI]
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20201209060932.212364-2-jianyong...@arm.com
---
 arch/arm/include/asm/hypervisor.h   |  3 ++
 arch/arm64/include/asm/hypervisor.h |  3 ++
 drivers/firmware/psci/psci.c|  2 ++
 drivers/firmware/smccc/Makefile |  2 +-
 drivers/firmware/smccc/kvm_guest.c  | 50 +
 drivers/firmware/smccc/smccc.c  |  1 +
 include/linux/arm-smccc.h   | 25 +++
 7 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 drivers/firmware/smccc/kvm_guest.c

diff --git a/arch/arm/include/asm/hypervisor.h 
b/arch/arm/include/asm/hypervisor.h
index df8524365637..bd61502b9715 100644
--- a/arch/arm/include/asm/hypervisor.h
+++ b/arch/arm/include/asm/hypervisor.h
@@ -4,4 +4,7 @@
 
 #include 
 
+void kvm_init_hyp_services(void);
+bool kvm_arm_hyp_service_available(u32 func_id);
+
 #endif
diff --git a/arch/arm64/include/asm/hypervisor.h 
b/arch/arm64/include/asm/hypervisor.h
index f9cc1d021791..0ae427f352c8 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -4,4 +4,7 @@
 
 #include 
 
+void kvm_init_hyp_services(void);
+bool kvm_arm_hyp_service_available(u32 func_id);
+
 #endif
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index f5fc429cae3f..69e296f02902 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -23,6 +23,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -498,6 +499,7 @@ static int __init psci_probe(void)
psci_init_cpu_suspend();
psci_init_system_suspend();
psci_init_system_reset2();
+   kvm_init_hyp_services();
}
 
return 0;
diff --git a/drivers/firmware/smccc/Makefile b/drivers/firmware/smccc/Makefile
index 72ab84042832..40d19144a860 100644
--- a/drivers/firmware/smccc/Makefile
+++ b/drivers/firmware/smccc/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 #
-obj-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smccc.o
+obj-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smccc.o kvm_guest.o
 obj-$(CONFIG_ARM_SMCCC_SOC_ID) += soc_id.o
diff --git a/drivers/firmware/smccc/kvm_guest.c 
b/drivers/firmware/smccc/kvm_guest.c
new file mode 100644
index ..08836f2f39ee
--- /dev/null
+++ b/drivers/firmware/smccc/kvm_guest.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) "smccc: KVM: " fmt
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+static DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS) 
__ro_after_init = { };
+
+void __init kvm_init_hyp_services(void)
+{
+   struct arm_smccc_res res;
+   u32 val[4];
+
+   if (arm_smccc_1_1_get_conduit() != SMCCC_CONDUIT_HVC)
+   return;
+
+   arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, &res);
+   if (res.a0 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 ||
+   res.a1 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1 ||
+   res.a2 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2 ||
+   res.a3 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3)
+   return;
+
+   memset(&res, 0, sizeof(res));
+   arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID, &res);
+
+   val[0] = lower_32_bits(res.a0);
+   val[1] = lower_32_bits(res.a1);
+   val[2] = lower_32_bits(res.a2);
+   val[3] = lower_32_bits(res.a3);
+
+   bitmap_from_arr32(__kvm_arm_hyp_services, val, ARM_SMCCC_KVM_NUM_FUNCS);
+
+   pr_info("hypervisor services detected (0x%08lx 0x%08lx 0x%08lx 
0x%08lx)\n",
+res.a3, res.a2, res.a1, res.a0);
+}
+
+bool kvm_arm_hyp_service_available(u32 func_id)
+{
+   if (func_id >= ARM_SMCCC_KVM_NUM_FUNCS)
+   return -EINVAL;
+
+   return test_bit(func_id, __kvm_arm_hyp_services);
+}
+EXPORT_SYMBOL_GPL(kvm_arm_hyp_service_available);
diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
index 00c88b809c0c..94eca6ffda05 100644
--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 
 static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
 stat

Re: [kvm-unit-tests PATCH v3 11/11] arm64: gic: Use IPI test checking for the LPI tests

2021-02-08 Thread Alexandru Elisei
Hi Eric,

On 2/5/21 1:30 PM, Auger Eric wrote:
> Hi Alexandru,
>
> On 1/29/21 5:36 PM, Alexandru Elisei wrote:
>> The LPI code validates a result similarly to the IPI tests, by checking if
>> the target CPU received the interrupt with the expected interrupt number.
>> However, the LPI tests invent their own way of checking the test results by
>> creating a global struct (lpi_stats), using a separate interrupt handler
>> (lpi_handler) and test function (check_lpi_stats).
>>
>> There are several areas that can be improved in the LPI code, which are
>> already covered by the IPI tests:
>>
>> - check_lpi_stats() doesn't take into account that the target CPU can
>>   receive the correct interrupt multiple times.
>> - check_lpi_stats() doesn't take into the account the scenarios where all
>>   online CPUs can receive the interrupt, but the target CPU is the last CPU
>>   that touches lpi_stats.observed.
>> - Insufficient or missing memory synchronization.
>>
>> Instead of duplicating code, let's convert the LPI tests to use
>> check_acked() and the same interrupt handler as the IPI tests, which has
>> been renamed to irq_handler() to avoid any confusion.
>>
>> check_lpi_stats() has been replaced with check_acked() which, together with
>> using irq_handler(), instantly gives us more correctness checks and proper
>> memory synchronization between threads. lpi_stats.expected has been
>> replaced by the CPU mask and the expected interrupt number arguments to
>> check_acked(), with no change in semantics.
>>
>> lpi_handler() aborted the test if the interrupt number was not an LPI. This
>> was changed in favor of allowing the test to continue, as it will fail in
>> check_acked(), but possibly print information useful for debugging. If the
>> test receives spurious interrupts, those are reported via report_info() at
>> the end of the test for consistency with the IPI tests, which don't treat
>> spurious interrupts as critical errors.
>>
>> In the spirit of code reuse, secondary_lpi_tests() has been replaced with
>> ipi_recv() because the two are now identical; ipi_recv() has been renamed
>> to irq_recv(), similarly to irq_handler(), to avoid confusion.
>>
>> CC: Eric Auger 
>> Signed-off-by: Alexandru Elisei 
>> ---
>>  arm/gic.c | 190 +-
>>  1 file changed, 87 insertions(+), 103 deletions(-)
>>
>> [..]
>> @@ -796,18 +737,31 @@ static void test_its_trigger(void)
>>   * The LPI should not hit
>>   */
>>  gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT);
>> -lpi_stats_expect(-1, -1);
>> +stats_reset();
>> +cpumask_clear(&mask);
>>  its_send_int(dev2, 20);
>> -check_lpi_stats("dev2/eventid=20 still does not trigger any LPI");
>> +wait_for_interrupts(&mask);
>> +report(check_acked(&mask, -1, -1),
>> +"dev2/eventid=20 still does not trigger any LPI");
>>  
>>  /* Now call the invall and check the LPI hits */
>> +stats_reset();
>> +/* The barrier is from its_send_int() */
>> +wmb();
> In v1 it was envisionned to add the wmb in __its_send_it but I fail to
> see it. Is it implicit in some way?

Thank you for having a look at this, it seems I forgot to remove this barrier.

The barriers in __its_send_int() and the one above are not needed because the
barrier is already present in its_send_invall() -> its_send_single_command() ->
its_post_commands() -> writeq() (the removal from __its_send_int() is also
explained in the cover letter).

I'll remove the wmb() barrier in the next version.

Thanks,

Alex

>
> Thanks
>
> Eric
>> +cpumask_clear(&mask);
>> +cpumask_set_cpu(3, &mask);
>>  its_send_invall(col3);
>> -lpi_stats_expect(3, 8195);
>> -check_lpi_stats("dev2/eventid=20 pending LPI is received");
>> +wait_for_interrupts(&mask);
>> +report(check_acked(&mask, 0, 8195),
>> +"dev2/eventid=20 pending LPI is received");
>>  
>> -lpi_stats_expect(3, 8195);
>> +stats_reset();
>> +cpumask_clear(&mask);
>> +cpumask_set_cpu(3, &mask);
>>  its_send_int(dev2, 20);
>> -check_lpi_stats("dev2/eventid=20 now triggers an LPI");
>> +wait_for_interrupts(&mask);
>> +report(check_acked(&mask, 0, 8195),
>> +"dev2/eventid=20 now triggers an LPI");
>>  
>>  report_prefix_pop();
>>  
>> @@ -818,9 +772,13 @@ static void test_its_trigger(void)
>>   */
>>  
>>  its_send_mapd(dev2, false);
>> -lpi_stats_expect(-1, -1);
>> +stats_reset();
>> +cpumask_clear(&mask);
>>  its_send_int(dev2, 20);
>> -check_lpi_stats("no LPI after device unmap");
>> +wait_for_interrupts(&mask);
>> +report(check_acked(&mask, -1, -1), "no LPI after device unmap");
>> +
>> +check_spurious();
>>  report_prefix_pop();
>>  }
>>  
>> @@ -828,6 +786,7 @@ static void test_its_migration(void)
>>  {
>>  struct its_device *dev2, *dev7;
>>  bool test_skipped = false;
>> +cpumask_t mask;
>>  
>>  if (its_setup1()

[RFC PATCH 4/4] KVM: arm64: Distinguish cases of memcache allocations completely

2021-02-08 Thread Yanan Wang
With a guest translation fault, the memcache pages are not needed if KVM
is only about to install a new leaf entry into the existing page table.
And with a guest permission fault, the memcache pages are also not needed
for a write_fault in dirty-logging time if KVM is only about to update
the existing leaf entry instead of collapsing a block entry into a table.

By comparing fault_granule and vma_pagesize, cases that require allocations
from memcache and cases that don't can be distinguished completely.

Signed-off-by: Yanan Wang 
---
 arch/arm64/kvm/mmu.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d151927a7d62..550498a9104e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -815,19 +815,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
gfn = fault_ipa >> PAGE_SHIFT;
mmap_read_unlock(current->mm);
 
-   /*
-* Permission faults just need to update the existing leaf entry,
-* and so normally don't require allocations from the memcache. The
-* only exception to this is when dirty logging is enabled at runtime
-* and a write fault needs to collapse a block entry into a table.
-*/
-   if (fault_status != FSC_PERM || (logging_active && write_fault)) {
-   ret = kvm_mmu_topup_memory_cache(memcache,
-kvm_mmu_cache_min_pages(kvm));
-   if (ret)
-   return ret;
-   }
-
mmu_seq = vcpu->kvm->mmu_notifier_seq;
/*
 * Ensure the read of mmu_notifier_seq happens before we call
@@ -887,6 +874,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
prot |= KVM_PGTABLE_PROT_X;
 
+   /*
+* Allocations from the memcache are required only when granule of the
+* lookup level where the guest fault happened exceeds vma_pagesize,
+* which means new page tables will be created in the fault handlers.
+*/
+   if (fault_granule > vma_pagesize) {
+   ret = kvm_mmu_topup_memory_cache(memcache,
+kvm_mmu_cache_min_pages(kvm));
+   if (ret)
+   return ret;
+   }
+
/*
 * Under the premise of getting a FSC_PERM fault, we just need to relax
 * permissions only if vma_pagesize equals fault_granule. Otherwise,
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 2/4] KVM: arm64: Add an independent API for coalescing tables

2021-02-08 Thread Yanan Wang
Process of coalescing page mappings back to a block mapping is different
from normal map path, such as TLB invalidation and CMOs, so here add an
independent API for this case.

Signed-off-by: Yanan Wang 
---
 arch/arm64/kvm/hyp/pgtable.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2f4f87021980..78a560446f80 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -525,6 +525,24 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, 
u32 level,
return 0;
 }
 
+static void stage2_coalesce_tables_into_block(u64 addr, u32 level,
+ kvm_pte_t *ptep,
+ struct stage2_map_data *data)
+{
+   u64 granule = kvm_granule_size(level), phys = data->phys;
+   kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+
+   kvm_set_invalid_pte(ptep);
+
+   /*
+* Invalidate the whole stage-2, as we may have numerous leaf entries
+* below us which would otherwise need invalidating individually.
+*/
+   kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+   smp_store_release(ptep, new);
+   data->phys += granule;
+}
+
 static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 kvm_pte_t *ptep,
 struct stage2_map_data *data)
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-02-08 Thread Yanan Wang
When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.

It will cost a long time to unmap the numerous page mappings, which means
there will be a long period when the table entry can be found invalid.
If other vCPUs access any guest page within the block range and find the
table entry invalid, they will all exit from guest with a translation fault
which is not necessary. And KVM will make efforts to handle these faults,
especially when performing CMOs by block range.

So let's quickly install the block entry at first to ensure uninterrupted
memory access of the other vCPUs, and then unmap the page mappings after
installation. This will reduce most of the time when the table entry is
invalid, and avoid most of the unnecessary translation faults.

Signed-off-by: Yanan Wang 
---
 arch/arm64/kvm/hyp/pgtable.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 78a560446f80..308c36b9cd21 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -434,6 +434,7 @@ struct stage2_map_data {
kvm_pte_t   attr;
 
kvm_pte_t   *anchor;
+   kvm_pte_t   *follow;
 
struct kvm_s2_mmu   *mmu;
struct kvm_mmu_memory_cache *memcache;
@@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, 
u32 level,
if (!kvm_block_mapping_supported(addr, end, data->phys, level))
return 0;
 
-   kvm_set_invalid_pte(ptep);
-
/*
-* Invalidate the whole stage-2, as we may have numerous leaf
-* entries below us which would otherwise need invalidating
-* individually.
+* If we need to coalesce existing table entries into a block here,
+* then install the block entry first and the sub-level page mappings
+* will be unmapped later.
 */
-   kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
data->anchor = ptep;
+   data->follow = kvm_pte_follow(*ptep);
+   stage2_coalesce_tables_into_block(addr, level, ptep, data);
return 0;
 }
 
@@ -614,20 +614,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, 
u32 level,
  kvm_pte_t *ptep,
  struct stage2_map_data *data)
 {
-   int ret = 0;
-
if (!data->anchor)
return 0;
 
-   free_page((unsigned long)kvm_pte_follow(*ptep));
-   put_page(virt_to_page(ptep));
-
-   if (data->anchor == ptep) {
+   if (data->anchor != ptep) {
+   free_page((unsigned long)kvm_pte_follow(*ptep));
+   put_page(virt_to_page(ptep));
+   } else {
+   free_page((unsigned long)data->follow);
data->anchor = NULL;
-   ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
}
 
-   return ret;
+   return 0;
 }
 
 /*
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 0/4] KVM: arm64: Improve efficiency of stage2 page table

2021-02-08 Thread Yanan Wang
Hi,

This series makes some efficiency improvement of stage2 page table code,
and there are some test results to present the performance changes, which
were tested by a kvm selftest [1] that I have post:
[1] 
https://lore.kernel.org/lkml/20210208090841.333724-1-wangyana...@huawei.com/ 

About patch 1:
We currently uniformly clean dcache in user_mem_abort() before calling the
fault handlers, if we take a translation fault and the pfn is cacheable.
But if there are concurrent translation faults on the same page or block,
clean of dcache for the first time is necessary while the others are not.

By moving clean of dcache to the map handler, we can easily identify the
conditions where CMOs are really needed and avoid the unnecessary ones.
As it's a time consuming process to perform CMOs especially when flushing
a block range, so this solution reduces much load of kvm and improve the
efficiency of creating mappings.

Test results:
(1) when 20 vCPUs concurrently access 20G ram (all 1G hugepages):
KVM create block mappings time: 52.83s -> 3.70s
KVM recover block mappings time(after dirty-logging): 52.0s -> 2.87s

(2) when 40 vCPUs concurrently access 20G ram (all 1G hugepages):
KVM creating block mappings time: 104.56s -> 3.70s
KVM recover block mappings time(after dirty-logging): 103.93s -> 2.96s

About patch 2, 3:
When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.

It will cost a lot of time to unmap the numerous page mappings, which means
the table entry will be left invalid for a long time before installation of
the block entry, and this will cause many spurious translation faults.

So let's quickly install the block entry at first to ensure uninterrupted
memory access of the other vCPUs, and then unmap the page mappings after
installation. This will reduce most of the time when the table entry is
invalid, and avoid most of the unnecessary translation faults.

Test results based on patch 1:
(1) when 20 vCPUs concurrently access 20G ram (all 1G hugepages):
KVM recover block mappings time(after dirty-logging): 2.87s -> 0.30s

(2) when 40 vCPUs concurrently access 20G ram (all 1G hugepages):
KVM recover block mappings time(after dirty-logging): 2.96s -> 0.35s

So combined with patch 1, it makes a big difference of KVM creating mappings
and recovering block mappings with not much code change.

About patch 4:
A new method to distinguish cases of memcache allocations is introduced.
By comparing fault_granule and vma_pagesize, cases that require allocations
from memcache and cases that don't can be distinguished completely.

---

Details of test results
platform: HiSilicon Kunpeng920 (FWB not supported)
host kernel: Linux mainline (v5.11-rc6)

(1) performance change of patch 1
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
   (20 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s
After  patch: KVM_CREATE_MAPPINGS:  3.7022s  3.7031s  3.7028s  3.7012s  3.7024s

Before patch: KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s
After  patch: KVM_ADJUST_MAPPINGS:  2.8787s  2.8781s  2.8785s  2.8742s  2.8759s

cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
   (40 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_CREATE_MAPPINGS: 104.560s 104.556s 104.554s 104.556s 104.550s
After  patch: KVM_CREATE_MAPPINGS:  3.7011s  3.7103s  3.7005s  3.7024s  3.7106s

Before patch: KVM_ADJUST_MAPPINGS: 103.931s 103.936s 103.927s 103.942s 103.927s
After  patch: KVM_ADJUST_MAPPINGS:  2.9621s  2.9648s  2.9474s  2.9587s  2.9603s

(2) performance change of patch 2, 3(based on patch 1)
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 1
   (1 vcpu, 20G memory, block mappings(granule 1G))
Before patch: KVM_ADJUST_MAPPINGS: 2.8241s 2.8234s 2.8245s 2.8230s 2.8652s
After  patch: KVM_ADJUST_MAPPINGS: 0.2444s 0.2442s 0.2423s 0.2441s 0.2429s

cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
   (20 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_ADJUST_MAPPINGS: 2.8787s 2.8781s 2.8785s 2.8742s 2.8759s
After  patch: KVM_ADJUST_MAPPINGS: 0.3008s 0.3004s 0.2974s 0.2917s 0.2900s

cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
   (40 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_ADJUST_MAPPINGS: 2.9621s 2.9648s 2.9474s 2.9587s 2.9603s
After  patch: KVM_ADJUST_MAPPINGS: 0.3541s 0.3694s 0.3656s 0.3693s 0.3687s

---

Yanan Wang (4):
  KVM: arm64: Move the clean of dcache to the map handler
  KVM: arm64: Add an independent API for coalescing tables
  KVM: arm64: Install the block entry before unmapping the page mappings
  KVM: arm64: Distinguish cases of memcache allocations completely

 arch/arm64/include/asm/kvm_mmu.h | 16 ---
 arch/arm64/kvm/hyp/pgtab

[RFC PATCH 1/4] KVM: arm64: Move the clean of dcache to the map handler

2021-02-08 Thread Yanan Wang
We currently uniformly clean dcache in user_mem_abort() before calling the
fault handlers, if we take a translation fault and the pfn is cacheable.
But if there are concurrent translation faults on the same page or block,
clean of dcache for the first time is necessary while the others are not.

By moving clean of dcache to the map handler, we can easily identify the
conditions where CMOs are really needed and avoid the unnecessary ones.
As it's a time consuming process to perform CMOs especially when flushing
a block range, so this solution reduces much load of kvm and improve the
efficiency of creating mappings.

Signed-off-by: Yanan Wang 
---
 arch/arm64/include/asm/kvm_mmu.h | 16 --
 arch/arm64/kvm/hyp/pgtable.c | 38 
 arch/arm64/kvm/mmu.c | 14 +++-
 3 files changed, 27 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e52d82aeadca..4ec9879e82ed 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -204,22 +204,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-   void *va = page_address(pfn_to_page(pfn));
-
-   /*
-* With FWB, we ensure that the guest always accesses memory using
-* cacheable attributes, and we don't have to clean to PoC when
-* faulting in pages. Furthermore, FWB implies IDC, so cleaning to
-* PoU is not required either in this case.
-*/
-   if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
-   return;
-
-   kvm_flush_dcache_to_poc(va, size);
-}
-
 static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
  unsigned long size)
 {
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4d177ce1d536..2f4f87021980 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -464,6 +464,26 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot 
prot,
return 0;
 }
 
+static bool stage2_pte_cacheable(kvm_pte_t pte)
+{
+   u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
+   return memattr == PAGE_S2_MEMATTR(NORMAL);
+}
+
+static void stage2_flush_dcache(void *addr, u64 size)
+{
+   /*
+* With FWB, we ensure that the guest always accesses memory using
+* cacheable attributes, and we don't have to clean to PoC when
+* faulting in pages. Furthermore, FWB implies IDC, so cleaning to
+* PoU is not required either in this case.
+*/
+   if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+   return;
+
+   __flush_dcache_area(addr, size);
+}
+
 static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
  kvm_pte_t *ptep,
  struct stage2_map_data *data)
@@ -495,6 +515,10 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, 
u32 level,
put_page(page);
}
 
+   /* Flush data cache before installation of the new PTE */
+   if (stage2_pte_cacheable(new))
+   stage2_flush_dcache(__va(phys), granule);
+
smp_store_release(ptep, new);
get_page(page);
data->phys += granule;
@@ -651,20 +675,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 
addr, u64 size,
return ret;
 }
 
-static void stage2_flush_dcache(void *addr, u64 size)
-{
-   if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
-   return;
-
-   __flush_dcache_area(addr, size);
-}
-
-static bool stage2_pte_cacheable(kvm_pte_t pte)
-{
-   u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
-   return memattr == PAGE_S2_MEMATTR(NORMAL);
-}
-
 static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
   enum kvm_pgtable_walk_flags flag,
   void * const arg)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..d151927a7d62 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -609,11 +609,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm 
*kvm,
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-   __clean_dcache_guest_page(pfn, size);
-}
-
 static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
__invalidate_icache_guest_page(pfn, size);
@@ -882,9 +877,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (writable)
prot |= KVM_PGTABLE_PROT_W;
 
-   if (fault_status != FSC_PERM && !device)
-   clean_dcache_guest_page(pfn, vma_pagesize);
-
if (exec_fa

[PATCH v7 23/23] [DO NOT MERGE] arm64: Cope with CPUs stuck in VHE mode

2021-02-08 Thread Marc Zyngier
It seems that the CPU known as Apple M1 has the terrible habit
of being stuck with HCR_EL2.E2H==1, in violation of the architecture.

Try and work around this deplorable state of affairs by detecting
the stuck bit early and short-circuit the nVHE dance. It is still
unknown whether there are many more such nuggets to be found...

Reported-by: Hector Martin 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kernel/head.S | 33 ++---
 arch/arm64/kernel/hyp-stub.S | 28 
 2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 2e116ef255e1..bce66d6bda74 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -477,14 +477,13 @@ EXPORT_SYMBOL(kimage_vaddr)
  * booted in EL1 or EL2 respectively.
  */
 SYM_FUNC_START(init_kernel_el)
-   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
-   msr sctlr_el1, x0
-
mrs x0, CurrentEL
cmp x0, #CurrentEL_EL2
b.eqinit_el2
 
 SYM_INNER_LABEL(init_el1, SYM_L_LOCAL)
+   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
+   msr sctlr_el1, x0
isb
mov_q   x0, INIT_PSTATE_EL1
msr spsr_el1, x0
@@ -504,6 +503,34 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
msr vbar_el2, x0
isb
 
+   /*
+* Fruity CPUs seem to have HCR_EL2.E2H set to RES1,
+* making it impossible to start in nVHE mode. Is that
+* compliant with the architecture? Absolutely not!
+*/
+   mrs x0, hcr_el2
+   and x0, x0, #HCR_E2H
+   cbz x0, 1f
+
+   /* Switching to VHE requires a sane SCTLR_EL1 as a start */
+   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
+   msr_s   SYS_SCTLR_EL12, x0
+
+   /*
+* Force an eret into a helper "function", and let it return
+* to our original caller... This makes sure that we have
+* initialised the basic PSTATE state.
+*/
+   mov x0, #INIT_PSTATE_EL2
+   msr spsr_el1, x0
+   adr_l   x0, stick_to_vhe
+   msr elr_el1, x0
+   eret
+
+1:
+   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
+   msr sctlr_el1, x0
+
msr elr_el2, lr
mov w0, #BOOT_CPU_MODE_EL2
eret
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 3e08dcc924b5..b55ed4af4c4a 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -27,12 +27,12 @@ SYM_CODE_START(__hyp_stub_vectors)
ventry  el2_fiq_invalid // FIQ EL2t
ventry  el2_error_invalid   // Error EL2t
 
-   ventry  el2_sync_invalid// Synchronous EL2h
+   ventry  elx_sync// Synchronous EL2h
ventry  el2_irq_invalid // IRQ EL2h
ventry  el2_fiq_invalid // FIQ EL2h
ventry  el2_error_invalid   // Error EL2h
 
-   ventry  el1_sync// Synchronous 64-bit EL1
+   ventry  elx_sync// Synchronous 64-bit EL1
ventry  el1_irq_invalid // IRQ 64-bit EL1
ventry  el1_fiq_invalid // FIQ 64-bit EL1
ventry  el1_error_invalid   // Error 64-bit EL1
@@ -45,7 +45,7 @@ SYM_CODE_END(__hyp_stub_vectors)
 
.align 11
 
-SYM_CODE_START_LOCAL(el1_sync)
+SYM_CODE_START_LOCAL(elx_sync)
cmp x0, #HVC_SET_VECTORS
b.ne1f
msr vbar_el2, x1
@@ -71,7 +71,7 @@ SYM_CODE_START_LOCAL(el1_sync)
 
 9: mov x0, xzr
eret
-SYM_CODE_END(el1_sync)
+SYM_CODE_END(elx_sync)
 
 // nVHE? No way! Give me the real thing!
 SYM_CODE_START_LOCAL(mutate_to_vhe)
@@ -227,3 +227,23 @@ SYM_FUNC_START(switch_to_vhe)
 #endif
ret
 SYM_FUNC_END(switch_to_vhe)
+
+SYM_FUNC_START(stick_to_vhe)
+   /*
+* Make sure the switch to VHE cannot fail, by overriding the
+* override. This is hilarious.
+*/
+   adr_l   x1, id_aa64mmfr1_override
+   add x1, x1, #FTR_OVR_MASK_OFFSET
+   dc  civac, x1
+   dsb sy
+   isb
+   ldr x0, [x1]
+   bic x0, x0, #(0xf << ID_AA64MMFR1_VHE_SHIFT)
+   str x0, [x1]
+
+   mov x0, #HVC_VHE_RESTART
+   hvc #0
+   mov x0, #BOOT_CPU_MODE_EL2
+   ret
+SYM_FUNC_END(stick_to_vhe)
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 12/23] arm64: Extract early FDT mapping from kaslr_early_init()

2021-02-08 Thread Marc Zyngier
As we want to parse more options very early in the kernel lifetime,
let's always map the FDT early. This is achieved by moving that
code out of kaslr_early_init().

No functionnal change expected.

Signed-off-by: Marc Zyngier 
Acked-by: Catalin Marinas 
Acked-by: David Brazdil 
---
 arch/arm64/include/asm/setup.h | 11 +++
 arch/arm64/kernel/head.S   |  3 ++-
 arch/arm64/kernel/kaslr.c  |  7 +++
 arch/arm64/kernel/setup.c  | 15 +++
 4 files changed, 31 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/setup.h

diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
new file mode 100644
index ..d3320618ed14
--- /dev/null
+++ b/arch/arm64/include/asm/setup.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef __ARM64_ASM_SETUP_H
+#define __ARM64_ASM_SETUP_H
+
+#include 
+
+void *get_early_fdt_ptr(void);
+void early_fdt_map(u64 dt_phys);
+
+#endif
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b425d2587cdb..d74e5f84042e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -433,6 +433,8 @@ SYM_FUNC_START_LOCAL(__primary_switched)
bl  __pi_memset
dsb ishst   // Make zero page visible to PTW
 
+   mov x0, x21 // pass FDT address in x0
+   bl  early_fdt_map   // Try mapping the FDT early
bl  switch_to_vhe
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
bl  kasan_early_init
@@ -440,7 +442,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #ifdef CONFIG_RANDOMIZE_BASE
tst x23, ~(MIN_KIMG_ALIGN - 1)  // already running randomized?
b.ne0f
-   mov x0, x21 // pass FDT address in x0
bl  kaslr_early_init// parse FDT for KASLR options
cbz x0, 0f  // KASLR disabled? just proceed
orr x23, x23, x0// record KASLR offset
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 1c74c45b9494..5fc86e7d01a1 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum kaslr_status {
KASLR_ENABLED,
@@ -92,12 +93,11 @@ static __init bool is_kaslr_disabled_cmdline(void *fdt)
  * containing function pointers) to be reinitialized, and zero-initialized
  * .bss variables will be reset to 0.
  */
-u64 __init kaslr_early_init(u64 dt_phys)
+u64 __init kaslr_early_init(void)
 {
void *fdt;
u64 seed, offset, mask, module_range;
unsigned long raw;
-   int size;
 
/*
 * Set a reasonable default for module_alloc_base in case
@@ -111,8 +111,7 @@ u64 __init kaslr_early_init(u64 dt_phys)
 * and proceed with KASLR disabled. We will make another
 * attempt at mapping the FDT in setup_machine()
 */
-   early_fixmap_init();
-   fdt = fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL);
+   fdt = get_early_fdt_ptr();
if (!fdt) {
kaslr_status = KASLR_DISABLED_FDT_REMAP;
return 0;
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index c18aacde8bb0..61845c0821d9 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -168,6 +168,21 @@ static void __init smp_build_mpidr_hash(void)
pr_warn("Large number of MPIDR hash buckets detected\n");
 }
 
+static void *early_fdt_ptr __initdata;
+
+void __init *get_early_fdt_ptr(void)
+{
+   return early_fdt_ptr;
+}
+
+asmlinkage void __init early_fdt_map(u64 dt_phys)
+{
+   int fdt_size;
+
+   early_fixmap_init();
+   early_fdt_ptr = fixmap_remap_fdt(dt_phys, &fdt_size, PAGE_KERNEL);
+}
+
 static void __init setup_machine_fdt(phys_addr_t dt_phys)
 {
int size;
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 15/23] arm64: Honor VHE being disabled from the command-line

2021-02-08 Thread Marc Zyngier
Finally we can check whether VHE is disabled on the command line,
and not enable it if that's the user's wish.

Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
Acked-by: Catalin Marinas 
---
 arch/arm64/kernel/asm-offsets.c |  3 +++
 arch/arm64/kernel/hyp-stub.S| 11 +++
 2 files changed, 14 insertions(+)

diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index f42fd9e33981..1add0f21bffe 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -99,6 +99,9 @@ int main(void)
   DEFINE(CPU_BOOT_STACK,   offsetof(struct secondary_data, stack));
   DEFINE(CPU_BOOT_TASK,offsetof(struct secondary_data, task));
   BLANK();
+  DEFINE(FTR_OVR_VAL_OFFSET,   offsetof(struct arm64_ftr_override, val));
+  DEFINE(FTR_OVR_MASK_OFFSET,  offsetof(struct arm64_ftr_override, mask));
+  BLANK();
 #ifdef CONFIG_KVM
   DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt));
   DEFINE(VCPU_FAULT_DISR,  offsetof(struct kvm_vcpu, arch.fault.disr_el1));
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 6229315d533d..3e08dcc924b5 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -87,6 +87,17 @@ SYM_CODE_START_LOCAL(mutate_to_vhe)
ubfxx1, x1, #ID_AA64MMFR1_VHE_SHIFT, #4
cbz x1, 1f
 
+   // Check whether VHE is disabled from the command line
+   adr_l   x1, id_aa64mmfr1_override
+   ldr x2, [x1, FTR_OVR_VAL_OFFSET]
+   ldr x1, [x1, FTR_OVR_MASK_OFFSET]
+   ubfxx2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
+   ubfxx1, x1, #ID_AA64MMFR1_VHE_SHIFT, #4
+   cmp x1, xzr
+   and x2, x2, x1
+   csinv   x2, x2, xzr, ne
+   cbz x2, 1f
+
// Engage the VHE magic!
mov_q   x0, HCR_HOST_VHE_FLAGS
msr hcr_el2, x0
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 21/23] arm64: Defer enabling pointer authentication on boot core

2021-02-08 Thread Marc Zyngier
From: Srinivas Ramana 

Defer enabling pointer authentication on boot core until
after its required to be enabled by cpufeature framework.
This will help in controlling the feature dynamically
with a boot parameter.

Signed-off-by: Ajay Patil 
Signed-off-by: Prasad Sodagudi 
Signed-off-by: Srinivas Ramana 
Signed-off-by: Marc Zyngier 
Link: 
https://lore.kernel.org/r/1610152163-16554-2-git-send-email-sram...@codeaurora.org
Reviewed-by: Catalin Marinas 
Acked-by: David Brazdil 
---
 arch/arm64/include/asm/pointer_auth.h   | 10 ++
 arch/arm64/include/asm/stackprotector.h |  1 +
 arch/arm64/kernel/head.S|  4 
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/pointer_auth.h 
b/arch/arm64/include/asm/pointer_auth.h
index c6b4f0603024..b112a11e9302 100644
--- a/arch/arm64/include/asm/pointer_auth.h
+++ b/arch/arm64/include/asm/pointer_auth.h
@@ -76,6 +76,15 @@ static inline unsigned long ptrauth_strip_insn_pac(unsigned 
long ptr)
return ptrauth_clear_pac(ptr);
 }
 
+static __always_inline void ptrauth_enable(void)
+{
+   if (!system_supports_address_auth())
+   return;
+   sysreg_clear_set(sctlr_el1, 0, (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB |
+   SCTLR_ELx_ENDA | SCTLR_ELx_ENDB));
+   isb();
+}
+
 #define ptrauth_thread_init_user(tsk)  \
ptrauth_keys_init_user(&(tsk)->thread.keys_user)
 #define ptrauth_thread_init_kernel(tsk)
\
@@ -84,6 +93,7 @@ static inline unsigned long ptrauth_strip_insn_pac(unsigned 
long ptr)
ptrauth_keys_switch_kernel(&(tsk)->thread.keys_kernel)
 
 #else /* CONFIG_ARM64_PTR_AUTH */
+#define ptrauth_enable()
 #define ptrauth_prctl_reset_keys(tsk, arg) (-EINVAL)
 #define ptrauth_strip_insn_pac(lr) (lr)
 #define ptrauth_thread_init_user(tsk)
diff --git a/arch/arm64/include/asm/stackprotector.h 
b/arch/arm64/include/asm/stackprotector.h
index 7263e0bac680..33f1bb453150 100644
--- a/arch/arm64/include/asm/stackprotector.h
+++ b/arch/arm64/include/asm/stackprotector.h
@@ -41,6 +41,7 @@ static __always_inline void boot_init_stack_canary(void)
 #endif
ptrauth_thread_init_kernel(current);
ptrauth_thread_switch_kernel(current);
+   ptrauth_enable();
 }
 
 #endif /* _ASM_STACKPROTECTOR_H */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 3243e3ae9bd8..2e116ef255e1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -404,10 +404,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
adr_l   x5, init_task
msr sp_el0, x5  // Save thread_info
 
-#ifdef CONFIG_ARM64_PTR_AUTH
-   __ptrauth_keys_init_cpu x5, x6, x7, x8
-#endif
-
adr_l   x8, vectors // load VBAR_EL1 with virtual
msr vbar_el1, x8// vector table address
isb
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 17/23] arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0

2021-02-08 Thread Marc Zyngier
Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't
that easy to understand, and it is likely that users would much
prefer write "kvm-arm.mode=nvhe", or "...=protected".

So here you go. This has the added advantage that we can now
always honor the "kvm-arm.mode=protected" option, even when
booting on a VHE system.

Signed-off-by: Marc Zyngier 
Acked-by: Catalin Marinas 
Acked-by: David Brazdil 
---
 Documentation/admin-guide/kernel-parameters.txt | 3 +++
 arch/arm64/kernel/idreg-override.c  | 2 ++
 arch/arm64/kvm/arm.c| 3 +++
 3 files changed, 8 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9e3cdb271d06..2786fd39a047 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2257,6 +2257,9 @@
kvm-arm.mode=
[KVM,ARM] Select one of KVM/arm64's modes of operation.
 
+   nvhe: Standard nVHE-based mode, without support for
+ protected guests.
+
protected: nVHE-based mode with support for guests whose
   state is kept private from the host.
   Not valid if the kernel is running in EL2.
diff --git a/arch/arm64/kernel/idreg-override.c 
b/arch/arm64/kernel/idreg-override.c
index 226bac544e20..b994d689d6fb 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -45,6 +45,8 @@ static const struct {
charalias[FTR_ALIAS_NAME_LEN];
charfeature[FTR_ALIAS_OPTION_LEN];
 } aliases[] __initconst = {
+   { "kvm-arm.mode=nvhe",  "id_aa64mmfr1.vh=0" },
+   { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" },
 };
 
 static int __init find_field(const char *cmdline,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 04c44853b103..597565a65ca2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1966,6 +1966,9 @@ static int __init early_kvm_mode_cfg(char *arg)
return 0;
}
 
+   if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode()))
+   return 0;
+
return -EINVAL;
 }
 early_param("kvm-arm.mode", early_kvm_mode_cfg);
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 13/23] arm64: cpufeature: Add an early command-line cpufeature override facility

2021-02-08 Thread Marc Zyngier
In order to be able to override CPU features at boot time,
let's add a command line parser that matches options of the
form "cpureg.feature=value", and store the corresponding
value into the override val/mask pair.

No features are currently defined, so no expected change in
functionality.

Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/kernel/Makefile |   2 +-
 arch/arm64/kernel/head.S   |   1 +
 arch/arm64/kernel/idreg-override.c | 150 +
 3 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/idreg-override.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 86364ab6f13f..2262f0392857 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -17,7 +17,7 @@ obj-y := debug-monitors.o entry.o irq.o 
fpsimd.o  \
   return_address.o cpuinfo.o cpu_errata.o  
\
   cpufeature.o alternative.o cacheinfo.o   
\
   smp.o smp_spin_table.o topology.o smccc-call.o   
\
-  syscall.o proton-pack.o
+  syscall.o proton-pack.o idreg-override.o
 
 targets+= efi-entry.o
 
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d74e5f84042e..3243e3ae9bd8 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -435,6 +435,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 
mov x0, x21 // pass FDT address in x0
bl  early_fdt_map   // Try mapping the FDT early
+   bl  init_feature_override
bl  switch_to_vhe
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
bl  kasan_early_init
diff --git a/arch/arm64/kernel/idreg-override.c 
b/arch/arm64/kernel/idreg-override.c
new file mode 100644
index ..3a347b42d07e
--- /dev/null
+++ b/arch/arm64/kernel/idreg-override.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Early cpufeature override framework
+ *
+ * Copyright (C) 2020 Google LLC
+ * Author: Marc Zyngier 
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define FTR_DESC_NAME_LEN  20
+#define FTR_DESC_FIELD_LEN 10
+
+struct ftr_set_desc {
+   charname[FTR_DESC_NAME_LEN];
+   struct arm64_ftr_override   *override;
+   struct {
+   charname[FTR_DESC_FIELD_LEN];
+   u8  shift;
+   }   fields[];
+};
+
+static const struct ftr_set_desc * const regs[] __initconst = {
+};
+
+static int __init find_field(const char *cmdline,
+const struct ftr_set_desc *reg, int f, u64 *v)
+{
+   char opt[FTR_DESC_NAME_LEN + FTR_DESC_FIELD_LEN + 2];
+   int len;
+
+   len = snprintf(opt, ARRAY_SIZE(opt), "%s.%s=",
+  reg->name, reg->fields[f].name);
+
+   if (!parameqn(cmdline, opt, len))
+   return -1;
+
+   return kstrtou64(cmdline + len, 0, v);
+}
+
+static void __init match_options(const char *cmdline)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(regs); i++) {
+   int f;
+
+   if (!regs[i]->override)
+   continue;
+
+   for (f = 0; strlen(regs[i]->fields[f].name); f++) {
+   u64 shift = regs[i]->fields[f].shift;
+   u64 mask = 0xfUL << shift;
+   u64 v;
+
+   if (find_field(cmdline, regs[i], f, &v))
+   continue;
+
+   regs[i]->override->val  &= ~mask;
+   regs[i]->override->val  |= (v << shift) & mask;
+   regs[i]->override->mask |= mask;
+
+   return;
+   }
+   }
+}
+
+static __init void __parse_cmdline(const char *cmdline)
+{
+   do {
+   char buf[256];
+   size_t len;
+   int i;
+
+   cmdline = skip_spaces(cmdline);
+
+   for (len = 0; cmdline[len] && !isspace(cmdline[len]); len++);
+   if (!len)
+   return;
+
+   len = min(len, ARRAY_SIZE(buf) - 1);
+   strncpy(buf, cmdline, len);
+   buf[len] = 0;
+
+   if (strcmp(buf, "--") == 0)
+   return;
+
+   cmdline += len;
+
+   match_options(buf);
+
+   } while (1);
+}
+
+static __init void parse_cmdline(void)
+{
+   if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
+   const u8 *prop;
+   void *fdt;
+   int node;
+
+   fdt = get_early_fdt_ptr();
+   if (!fdt)
+   goto out;
+
+   node = fdt_path_o

[PATCH v7 10/23] arm64: cpufeature: Add global feature override facility

2021-02-08 Thread Marc Zyngier
Add a facility to globally override a feature, no matter what
the HW says. Yes, this sounds dangerous, but we do respect the
"safe" value for a given feature. This doesn't mean the user
doesn't need to know what they are doing.

Nothing uses this yet, so we are pretty safe. For now.

Signed-off-by: Marc Zyngier 
Reviewed-by: Suzuki K Poulose 
Acked-by: David Brazdil 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/cpufeature.h |  6 
 arch/arm64/kernel/cpufeature.c  | 45 +
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 9a555809b89c..b1f53147e2b2 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -63,6 +63,11 @@ struct arm64_ftr_bits {
s64 safe_val; /* safe value for FTR_EXACT features */
 };
 
+struct arm64_ftr_override {
+   u64 val;
+   u64 mask;
+};
+
 /*
  * @arm64_ftr_reg - Feature register
  * @strict_maskBits which should match across all CPUs for 
sanity.
@@ -74,6 +79,7 @@ struct arm64_ftr_reg {
u64 user_mask;
u64 sys_val;
u64 user_val;
+   struct arm64_ftr_override   *override;
const struct arm64_ftr_bits *ftr_bits;
 };
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index e99eddec0a46..a4e5c619a516 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -352,9 +352,12 @@ static const struct arm64_ftr_bits ftr_ctr[] = {
ARM64_FTR_END,
 };
 
+static struct arm64_ftr_override __ro_after_init no_override = { };
+
 struct arm64_ftr_reg arm64_ftr_reg_ctrel0 = {
.name   = "SYS_CTR_EL0",
-   .ftr_bits   = ftr_ctr
+   .ftr_bits   = ftr_ctr,
+   .override   = &no_override,
 };
 
 static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
@@ -544,13 +547,16 @@ static const struct arm64_ftr_bits ftr_raz[] = {
ARM64_FTR_END,
 };
 
-#define ARM64_FTR_REG(id, table) { \
-   .sys_id = id,   \
-   .reg =  &(struct arm64_ftr_reg){\
-   .name = #id,\
-   .ftr_bits = &((table)[0]),  \
+#define ARM64_FTR_REG_OVERRIDE(id, table, ovr) {   \
+   .sys_id = id,   \
+   .reg =  &(struct arm64_ftr_reg){\
+   .name = #id,\
+   .override = (ovr),  \
+   .ftr_bits = &((table)[0]),  \
}}
 
+#define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, 
&no_override)
+
 static const struct __ftr_reg_entry {
u32 sys_id;
struct arm64_ftr_reg*reg;
@@ -770,6 +776,33 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new)
for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) {
u64 ftr_mask = arm64_ftr_mask(ftrp);
s64 ftr_new = arm64_ftr_value(ftrp, new);
+   s64 ftr_ovr = arm64_ftr_value(ftrp, reg->override->val);
+
+   if ((ftr_mask & reg->override->mask) == ftr_mask) {
+   s64 tmp = arm64_ftr_safe_value(ftrp, ftr_ovr, ftr_new);
+   char *str = NULL;
+
+   if (ftr_ovr != tmp) {
+   /* Unsafe, remove the override */
+   reg->override->mask &= ~ftr_mask;
+   reg->override->val &= ~ftr_mask;
+   tmp = ftr_ovr;
+   str = "ignoring override";
+   } else if (ftr_new != tmp) {
+   /* Override was valid */
+   ftr_new = tmp;
+   str = "forced";
+   } else if (ftr_ovr == tmp) {
+   /* Override was the safe value */
+   str = "already set";
+   }
+
+   if (str)
+   pr_warn("%s[%d:%d]: %s to %llx\n",
+   reg->name,
+   ftrp->shift + ftrp->width - 1,
+   ftrp->shift, str, tmp);
+   }
 
val = arm64_ftr_set_value(ftrp, val, ftr_new);
 
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 19/23] arm64: Move "nokaslr" over to the early cpufeature infrastructure

2021-02-08 Thread Marc Zyngier
Given that the early cpufeature infrastructure has borrowed quite
a lot of code from the kaslr implementation, let's reimplement
the matching of the "nokaslr" option with it.

Signed-off-by: Marc Zyngier 
Acked-by: Catalin Marinas 
Acked-by: David Brazdil 
---
 arch/arm64/kernel/idreg-override.c | 15 +
 arch/arm64/kernel/kaslr.c  | 36 ++
 2 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kernel/idreg-override.c 
b/arch/arm64/kernel/idreg-override.c
index b994d689d6fb..70dd70eee7a2 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -37,8 +37,22 @@ static const struct ftr_set_desc mmfr1 __initconst = {
},
 };
 
+extern struct arm64_ftr_override kaslr_feature_override;
+
+static const struct ftr_set_desc kaslr __initconst = {
+   .name   = "kaslr",
+#ifdef CONFIG_RANDOMIZE_BASE
+   .override   = &kaslr_feature_override,
+#endif
+   .fields = {
+   { "disabled", 0 },
+   {}
+   },
+};
+
 static const struct ftr_set_desc * const regs[] __initconst = {
&mmfr1,
+   &kaslr,
 };
 
 static const struct {
@@ -47,6 +61,7 @@ static const struct {
 } aliases[] __initconst = {
{ "kvm-arm.mode=nvhe",  "id_aa64mmfr1.vh=0" },
{ "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" },
+   { "nokaslr","kaslr.disabled=1" },
 };
 
 static int __init find_field(const char *cmdline,
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 5fc86e7d01a1..27f8939deb1b 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -51,39 +51,7 @@ static __init u64 get_kaslr_seed(void *fdt)
return ret;
 }
 
-static __init bool cmdline_contains_nokaslr(const u8 *cmdline)
-{
-   const u8 *str;
-
-   str = strstr(cmdline, "nokaslr");
-   return str == cmdline || (str > cmdline && *(str - 1) == ' ');
-}
-
-static __init bool is_kaslr_disabled_cmdline(void *fdt)
-{
-   if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
-   int node;
-   const u8 *prop;
-
-   node = fdt_path_offset(fdt, "/chosen");
-   if (node < 0)
-   goto out;
-
-   prop = fdt_getprop(fdt, node, "bootargs", NULL);
-   if (!prop)
-   goto out;
-
-   if (cmdline_contains_nokaslr(prop))
-   return true;
-
-   if (IS_ENABLED(CONFIG_CMDLINE_EXTEND))
-   goto out;
-
-   return false;
-   }
-out:
-   return cmdline_contains_nokaslr(CONFIG_CMDLINE);
-}
+struct arm64_ftr_override kaslr_feature_override __initdata;
 
 /*
  * This routine will be executed with the kernel mapped at its default virtual
@@ -126,7 +94,7 @@ u64 __init kaslr_early_init(void)
 * Check if 'nokaslr' appears on the command line, and
 * return 0 if that is the case.
 */
-   if (is_kaslr_disabled_cmdline(fdt)) {
+   if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) {
kaslr_status = KASLR_DISABLED_CMDLINE;
return 0;
}
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 14/23] arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line

2021-02-08 Thread Marc Zyngier
As we want to be able to disable VHE at runtime, let's match
"id_aa64mmfr1.vh=" from the command line as an override.
This doesn't have much effect yet as our boot code doesn't look
at the cpufeature, but only at the HW registers.

Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
Acked-by: Suzuki K Poulose 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/cpufeature.h |  2 ++
 arch/arm64/kernel/cpufeature.c  |  5 -
 arch/arm64/kernel/idreg-override.c  | 11 +++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index b5bf7af68691..570f1b4ba3cc 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -818,6 +818,8 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
return 8;
 }
 
+extern struct arm64_ftr_override id_aa64mmfr1_override;
+
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 97da9ed4b79d..faada5d8bea6 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -557,6 +557,8 @@ static const struct arm64_ftr_bits ftr_raz[] = {
 
 #define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, 
&no_override)
 
+struct arm64_ftr_override __ro_after_init id_aa64mmfr1_override;
+
 static const struct __ftr_reg_entry {
u32 sys_id;
struct arm64_ftr_reg*reg;
@@ -604,7 +606,8 @@ static const struct __ftr_reg_entry {
 
/* Op1 = 0, CRn = 0, CRm = 7 */
ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0),
-   ARM64_FTR_REG(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1),
+   ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1,
+  &id_aa64mmfr1_override),
ARM64_FTR_REG(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2),
 
/* Op1 = 0, CRn = 1, CRm = 2 */
diff --git a/arch/arm64/kernel/idreg-override.c 
b/arch/arm64/kernel/idreg-override.c
index 3a347b42d07e..2da11bf60195 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -11,6 +11,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #define FTR_DESC_NAME_LEN  20
@@ -25,7 +26,17 @@ struct ftr_set_desc {
}   fields[];
 };
 
+static const struct ftr_set_desc mmfr1 __initconst = {
+   .name   = "id_aa64mmfr1",
+   .override   = &id_aa64mmfr1_override,
+   .fields = {
+   { "vh", ID_AA64MMFR1_VHE_SHIFT },
+   {}
+   },
+};
+
 static const struct ftr_set_desc * const regs[] __initconst = {
+   &mmfr1,
 };
 
 static int __init find_field(const char *cmdline,
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 11/23] arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding()

2021-02-08 Thread Marc Zyngier
__read_sysreg_by_encoding() is used by a bunch of cpufeature helpers,
which should take the feature override into account. Let's do that.

For a good measure (and because we are likely to need to further
down the line), make this helper available to the rest of the
non-modular kernel.

Code that needs to know the *real* features of a CPU can still
use read_sysreg_s(), and find the bare, ugly truth.

Signed-off-by: Marc Zyngier 
Reviewed-by: Suzuki K Poulose 
Acked-by: David Brazdil 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/cpufeature.h |  1 +
 arch/arm64/kernel/cpufeature.c  | 15 +--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index b1f53147e2b2..b5bf7af68691 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -606,6 +606,7 @@ void __init setup_cpu_features(void);
 void check_local_cpu_capabilities(void);
 
 u64 read_sanitised_ftr_reg(u32 id);
+u64 __read_sysreg_by_encoding(u32 sys_id);
 
 static inline bool cpu_supports_mixed_endian_el0(void)
 {
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a4e5c619a516..97da9ed4b79d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1148,14 +1148,17 @@ u64 read_sanitised_ftr_reg(u32 id)
 EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
 
 #define read_sysreg_case(r)\
-   case r: return read_sysreg_s(r)
+   case r: val = read_sysreg_s(r); break;
 
 /*
  * __read_sysreg_by_encoding() - Used by a STARTING cpu before cpuinfo is 
populated.
  * Read the system register on the current CPU
  */
-static u64 __read_sysreg_by_encoding(u32 sys_id)
+u64 __read_sysreg_by_encoding(u32 sys_id)
 {
+   struct arm64_ftr_reg *regp;
+   u64 val;
+
switch (sys_id) {
read_sysreg_case(SYS_ID_PFR0_EL1);
read_sysreg_case(SYS_ID_PFR1_EL1);
@@ -1198,6 +1201,14 @@ static u64 __read_sysreg_by_encoding(u32 sys_id)
BUG();
return 0;
}
+
+   regp  = get_arm64_ftr_reg(sys_id);
+   if (regp) {
+   val &= ~regp->override->mask;
+   val |= (regp->override->val & regp->override->mask);
+   }
+
+   return val;
 }
 
 #include 
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 16/23] arm64: Add an aliasing facility for the idreg override

2021-02-08 Thread Marc Zyngier
In order to map the override of idregs to options that a user
can easily understand, let's introduce yet another option
array, which maps an option to the corresponding idreg options.

Signed-off-by: Marc Zyngier 
Reviewed-by: Catalin Marinas 
Acked-by: David Brazdil 
---
 arch/arm64/kernel/idreg-override.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/idreg-override.c 
b/arch/arm64/kernel/idreg-override.c
index 2da11bf60195..226bac544e20 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -16,6 +16,8 @@
 
 #define FTR_DESC_NAME_LEN  20
 #define FTR_DESC_FIELD_LEN 10
+#define FTR_ALIAS_NAME_LEN 30
+#define FTR_ALIAS_OPTION_LEN   80
 
 struct ftr_set_desc {
charname[FTR_DESC_NAME_LEN];
@@ -39,6 +41,12 @@ static const struct ftr_set_desc * const regs[] __initconst 
= {
&mmfr1,
 };
 
+static const struct {
+   charalias[FTR_ALIAS_NAME_LEN];
+   charfeature[FTR_ALIAS_OPTION_LEN];
+} aliases[] __initconst = {
+};
+
 static int __init find_field(const char *cmdline,
 const struct ftr_set_desc *reg, int f, u64 *v)
 {
@@ -81,7 +89,7 @@ static void __init match_options(const char *cmdline)
}
 }
 
-static __init void __parse_cmdline(const char *cmdline)
+static __init void __parse_cmdline(const char *cmdline, bool parse_aliases)
 {
do {
char buf[256];
@@ -105,6 +113,9 @@ static __init void __parse_cmdline(const char *cmdline)
 
match_options(buf);
 
+   for (i = 0; parse_aliases && i < ARRAY_SIZE(aliases); i++)
+   if (parameq(buf, aliases[i].alias))
+   __parse_cmdline(aliases[i].feature, false);
} while (1);
 }
 
@@ -127,14 +138,14 @@ static __init void parse_cmdline(void)
if (!prop)
goto out;
 
-   __parse_cmdline(prop);
+   __parse_cmdline(prop, true);
 
if (!IS_ENABLED(CONFIG_CMDLINE_EXTEND))
return;
}
 
 out:
-   __parse_cmdline(CONFIG_CMDLINE);
+   __parse_cmdline(CONFIG_CMDLINE, true);
 }
 
 /* Keep checkers quiet */
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 18/23] KVM: arm64: Document HVC_VHE_RESTART stub hypercall

2021-02-08 Thread Marc Zyngier
For completeness, let's document the HVC_VHE_RESTART stub.

Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
---
 Documentation/virt/kvm/arm/hyp-abi.rst | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/virt/kvm/arm/hyp-abi.rst 
b/Documentation/virt/kvm/arm/hyp-abi.rst
index 83cadd8186fa..4d43fbc25195 100644
--- a/Documentation/virt/kvm/arm/hyp-abi.rst
+++ b/Documentation/virt/kvm/arm/hyp-abi.rst
@@ -58,6 +58,15 @@ these functions (see arch/arm{,64}/include/asm/virt.h):
   into place (arm64 only), and jump to the restart address while at HYP/EL2.
   This hypercall is not expected to return to its caller.
 
+* ::
+
+x0 = HVC_VHE_RESTART (arm64 only)
+
+  Attempt to upgrade the kernel's exception level from EL1 to EL2 by enabling
+  the VHE mode. This is conditioned by the CPU supporting VHE, the EL2 MMU
+  being off, and VHE not being disabled by any other means (command line
+  option, for example).
+
 Any other value of r0/x0 triggers a hypervisor-specific handling,
 which is not documented here.
 
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 22/23] arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line

2021-02-08 Thread Marc Zyngier
In order to be able to disable Pointer Authentication  at runtime,
whether it is for testing purposes, or to work around HW issues,
let's add support for overriding the ID_AA64ISAR1_EL1.{GPI,GPA,API,APA}
fields.

This is further mapped on the arm64.nopauth command-line alias.

Signed-off-by: Marc Zyngier 
Reviewed-by: Catalin Marinas 
Acked-by: David Brazdil 
Tested-by: Srinivas Ramana 
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 arch/arm64/include/asm/cpufeature.h |  1 +
 arch/arm64/kernel/cpufeature.c  |  4 +++-
 arch/arm64/kernel/idreg-override.c  | 16 
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 7599fd0f1ad7..f9cb28a39bd0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -376,6 +376,9 @@
arm64.nobti [ARM64] Unconditionally disable Branch Target
Identification support
 
+   arm64.nopauth   [ARM64] Unconditionally disable Pointer Authentication
+   support
+
ataflop=[HW,M68k]
 
atarimouse= [HW,MOUSE] Atari Mouse
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 30917b9a760b..61177bac49fa 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -820,6 +820,7 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
 
 extern struct arm64_ftr_override id_aa64mmfr1_override;
 extern struct arm64_ftr_override id_aa64pfr1_override;
+extern struct arm64_ftr_override id_aa64isar1_override;
 
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7fbeab497adb..3bce87a03717 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -559,6 +559,7 @@ static const struct arm64_ftr_bits ftr_raz[] = {
 
 struct arm64_ftr_override __ro_after_init id_aa64mmfr1_override;
 struct arm64_ftr_override __ro_after_init id_aa64pfr1_override;
+struct arm64_ftr_override __ro_after_init id_aa64isar1_override;
 
 static const struct __ftr_reg_entry {
u32 sys_id;
@@ -604,7 +605,8 @@ static const struct __ftr_reg_entry {
 
/* Op1 = 0, CRn = 0, CRm = 6 */
ARM64_FTR_REG(SYS_ID_AA64ISAR0_EL1, ftr_id_aa64isar0),
-   ARM64_FTR_REG(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1),
+   ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1,
+  &id_aa64isar1_override),
 
/* Op1 = 0, CRn = 0, CRm = 7 */
ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0),
diff --git a/arch/arm64/kernel/idreg-override.c 
b/arch/arm64/kernel/idreg-override.c
index d691e9015c62..dffb16682330 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -46,6 +46,18 @@ static const struct ftr_set_desc pfr1 __initconst = {
},
 };
 
+static const struct ftr_set_desc isar1 __initconst = {
+   .name   = "id_aa64isar1",
+   .override   = &id_aa64isar1_override,
+   .fields = {
+   { "gpi", ID_AA64ISAR1_GPI_SHIFT },
+   { "gpa", ID_AA64ISAR1_GPA_SHIFT },
+   { "api", ID_AA64ISAR1_API_SHIFT },
+   { "apa", ID_AA64ISAR1_APA_SHIFT },
+   {}
+   },
+};
+
 extern struct arm64_ftr_override kaslr_feature_override;
 
 static const struct ftr_set_desc kaslr __initconst = {
@@ -62,6 +74,7 @@ static const struct ftr_set_desc kaslr __initconst = {
 static const struct ftr_set_desc * const regs[] __initconst = {
&mmfr1,
&pfr1,
+   &isar1,
&kaslr,
 };
 
@@ -72,6 +85,9 @@ static const struct {
{ "kvm-arm.mode=nvhe",  "id_aa64mmfr1.vh=0" },
{ "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" },
{ "arm64.nobti","id_aa64pfr1.bt=0" },
+   { "arm64.nopauth",
+ "id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 "
+ "id_aa64isar1.api=0 id_aa64isar1.apa=0"  },
{ "nokaslr","kaslr.disabled=1" },
 };
 
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 20/23] arm64: cpufeatures: Allow disabling of BTI from the command-line

2021-02-08 Thread Marc Zyngier
In order to be able to disable BTI at runtime, whether it is
for testing purposes, or to work around HW issues, let's add
support for overriding the ID_AA64PFR1_EL1.BTI field.

This is further mapped on the arm64.nobti command-line alias.

Signed-off-by: Marc Zyngier 
Reviewed-by: Catalin Marinas 
Acked-by: David Brazdil 
Tested-by: Srinivas Ramana 
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 arch/arm64/include/asm/cpufeature.h |  1 +
 arch/arm64/kernel/cpufeature.c  |  4 +++-
 arch/arm64/kernel/idreg-override.c  | 11 +++
 arch/arm64/mm/mmu.c |  2 +-
 5 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 2786fd39a047..7599fd0f1ad7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -373,6 +373,9 @@
arcrimi=[HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
Format: ,,
 
+   arm64.nobti [ARM64] Unconditionally disable Branch Target
+   Identification support
+
ataflop=[HW,M68k]
 
atarimouse= [HW,MOUSE] Atari Mouse
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 570f1b4ba3cc..30917b9a760b 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -819,6 +819,7 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
 }
 
 extern struct arm64_ftr_override id_aa64mmfr1_override;
+extern struct arm64_ftr_override id_aa64pfr1_override;
 
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index faada5d8bea6..7fbeab497adb 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -558,6 +558,7 @@ static const struct arm64_ftr_bits ftr_raz[] = {
 #define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, 
&no_override)
 
 struct arm64_ftr_override __ro_after_init id_aa64mmfr1_override;
+struct arm64_ftr_override __ro_after_init id_aa64pfr1_override;
 
 static const struct __ftr_reg_entry {
u32 sys_id;
@@ -593,7 +594,8 @@ static const struct __ftr_reg_entry {
 
/* Op1 = 0, CRn = 0, CRm = 4 */
ARM64_FTR_REG(SYS_ID_AA64PFR0_EL1, ftr_id_aa64pfr0),
-   ARM64_FTR_REG(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1),
+   ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1,
+  &id_aa64pfr1_override),
ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0),
 
/* Op1 = 0, CRn = 0, CRm = 5 */
diff --git a/arch/arm64/kernel/idreg-override.c 
b/arch/arm64/kernel/idreg-override.c
index 70dd70eee7a2..d691e9015c62 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -37,6 +37,15 @@ static const struct ftr_set_desc mmfr1 __initconst = {
},
 };
 
+static const struct ftr_set_desc pfr1 __initconst = {
+   .name   = "id_aa64pfr1",
+   .override   = &id_aa64pfr1_override,
+   .fields = {
+   { "bt", ID_AA64PFR1_BT_SHIFT },
+   {}
+   },
+};
+
 extern struct arm64_ftr_override kaslr_feature_override;
 
 static const struct ftr_set_desc kaslr __initconst = {
@@ -52,6 +61,7 @@ static const struct ftr_set_desc kaslr __initconst = {
 
 static const struct ftr_set_desc * const regs[] __initconst = {
&mmfr1,
+   &pfr1,
&kaslr,
 };
 
@@ -61,6 +71,7 @@ static const struct {
 } aliases[] __initconst = {
{ "kvm-arm.mode=nvhe",  "id_aa64mmfr1.vh=0" },
{ "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" },
+   { "arm64.nobti","id_aa64pfr1.bt=0" },
{ "nokaslr","kaslr.disabled=1" },
 };
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ae0c3d023824..617e704c980b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -628,7 +628,7 @@ static bool arm64_early_this_cpu_has_bti(void)
if (!IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
return false;
 
-   pfr1 = read_sysreg_s(SYS_ID_AA64PFR1_EL1);
+   pfr1 = __read_sysreg_by_encoding(SYS_ID_AA64PFR1_EL1);
return cpuid_feature_extract_unsigned_field(pfr1,
ID_AA64PFR1_BT_SHIFT);
 }
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 09/23] arm64: Move SCTLR_EL1 initialisation to EL-agnostic code

2021-02-08 Thread Marc Zyngier
We can now move the initial SCTLR_EL1 setup to be used for both
EL1 and EL2 setup.

Signed-off-by: Marc Zyngier 
Acked-by: Catalin Marinas 
Acked-by: David Brazdil 
---
 arch/arm64/kernel/head.S | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 36212c05df42..b425d2587cdb 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -479,13 +479,14 @@ EXPORT_SYMBOL(kimage_vaddr)
  * booted in EL1 or EL2 respectively.
  */
 SYM_FUNC_START(init_kernel_el)
+   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
+   msr sctlr_el1, x0
+
mrs x0, CurrentEL
cmp x0, #CurrentEL_EL2
b.eqinit_el2
 
 SYM_INNER_LABEL(init_el1, SYM_L_LOCAL)
-   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
-   msr sctlr_el1, x0
isb
mov_q   x0, INIT_PSTATE_EL1
msr spsr_el1, x0
@@ -494,9 +495,6 @@ SYM_INNER_LABEL(init_el1, SYM_L_LOCAL)
eret
 
 SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
-   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
-   msr sctlr_el1, x0
-
mov_q   x0, HCR_HOST_NVHE_FLAGS
msr hcr_el2, x0
isb
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 08/23] arm64: Simplify init_el2_state to be non-VHE only

2021-02-08 Thread Marc Zyngier
As init_el2_state is now nVHE only, let's simplify it and drop
the VHE setup.

Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
Acked-by: Catalin Marinas 
---
 arch/arm64/include/asm/el2_setup.h | 33 --
 arch/arm64/kernel/head.S   |  2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S |  2 +-
 3 files changed, 10 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/el2_setup.h 
b/arch/arm64/include/asm/el2_setup.h
index 56c9e1cef180..d77d358f9395 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -32,16 +32,14 @@
  * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
  * EL2.
  */
-.macro __init_el2_timers mode
-.ifeqs "\mode", "nvhe"
+.macro __init_el2_timers
mrs x0, cnthctl_el2
orr x0, x0, #3  // Enable EL1 physical timers
msr cnthctl_el2, x0
-.endif
msr cntvoff_el2, xzr// Clear virtual offset
 .endm
 
-.macro __init_el2_debug mode
+.macro __init_el2_debug
mrs x1, id_aa64dfr0_el1
sbfxx0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
cmp x0, #1
@@ -55,7 +53,6 @@
ubfxx0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
cbz x0, .Lskip_spe_\@   // Skip if SPE not present
 
-.ifeqs "\mode", "nvhe"
mrs_s   x0, SYS_PMBIDR_EL1  // If SPE available at EL2,
and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
cbnzx0, .Lskip_spe_el2_\@   // then permit sampling of 
physical
@@ -66,7 +63,6 @@
mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
orr x2, x2, x0  // If we don't have VHE, then
// use EL1&0 translation.
-.endif
 
 .Lskip_spe_\@:
msr mdcr_el2, x2// Configure debug traps
@@ -142,37 +138,24 @@
 
 /**
  * Initialize EL2 registers to sane values. This should be called early on all
- * cores that were booted in EL2.
+ * cores that were booted in EL2. Note that everything gets initialised as
+ * if VHE was not evailable. The kernel context will be upgraded to VHE
+ * if possible later on in the boot process
  *
  * Regs: x0, x1 and x2 are clobbered.
  */
-.macro init_el2_state mode
-.ifnes "\mode", "vhe"
-.ifnes "\mode", "nvhe"
-.error "Invalid 'mode' argument"
-.endif
-.endif
-
+.macro init_el2_state
__init_el2_sctlr
-   __init_el2_timers \mode
-   __init_el2_debug \mode
+   __init_el2_timers
+   __init_el2_debug
__init_el2_lor
__init_el2_stage2
__init_el2_gicv3
__init_el2_hstr
-
-   /*
-* When VHE is not in use, early init of EL2 needs to be done here.
-* When VHE _is_ in use, EL1 will not be used in the host and
-* requires no configuration, and all non-hyp-specific EL2 setup
-* will be done via the _EL1 system register aliases in __cpu_setup.
-*/
-.ifeqs "\mode", "nvhe"
__init_el2_nvhe_idregs
__init_el2_nvhe_cptr
__init_el2_nvhe_sve
__init_el2_nvhe_prepare_eret
-.endif
 .endm
 
 #endif /* __ARM_KVM_INIT_H__ */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 07445fd976ef..36212c05df42 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -501,7 +501,7 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
msr hcr_el2, x0
isb
 
-   init_el2_state nvhe
+   init_el2_state
 
/* Hypervisor stub */
adr_l   x0, __hyp_stub_vectors
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S 
b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 31b060a44045..222cfc3e7190 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -189,7 +189,7 @@ SYM_CODE_START_LOCAL(__kvm_hyp_init_cpu)
 2: msr SPsel, #1   // We want to use SP_EL{1,2}
 
/* Initialize EL2 CPU state to sane values. */
-   init_el2_state nvhe // Clobbers x0..x2
+   init_el2_state  // Clobbers x0..x2
 
/* Enable MMU, set vectors and stack. */
mov x0, x28
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 07/23] arm64: Move VHE-specific SPE setup to mutate_to_vhe()

2021-02-08 Thread Marc Zyngier
There isn't much that a VHE kernel needs on top of whatever has
been done for nVHE, so let's move the little we need to the
VHE stub (the SPE setup), and drop the init_el2_state macro.

No expected functional change.

Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
Acked-by: Catalin Marinas 
---
 arch/arm64/kernel/hyp-stub.S | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 373ed2213e1d..6229315d533d 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -92,9 +92,6 @@ SYM_CODE_START_LOCAL(mutate_to_vhe)
msr hcr_el2, x0
isb
 
-   // Doesn't do much on VHE, but still, worth a shot
-   init_el2_state vhe
-
// Use the EL1 allocated stack, per-cpu offset
mrs x0, sp_el1
mov sp, x0
@@ -107,6 +104,11 @@ SYM_CODE_START_LOCAL(mutate_to_vhe)
mrs_s   x0, SYS_VBAR_EL12
msr vbar_el1, x0
 
+   // Use EL2 translations for SPE and disable access from EL1
+   mrs x0, mdcr_el2
+   bic x0, x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
+   msr mdcr_el2, x0
+
// Transfer the MM state from EL1 to EL2
mrs_s   x0, SYS_TCR_EL12
msr tcr_el1, x0
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 06/23] arm64: Drop early setting of MDSCR_EL2.TPMS

2021-02-08 Thread Marc Zyngier
When running VHE, we set MDSCR_EL2.TPMS very early on to force
the trapping of EL1 SPE accesses to EL2.

However:
- we are running with HCR_EL2.{E2H,TGE}={1,1}, meaning that there
  is no EL1 to trap from

- before entering a guest, we call kvm_arm_setup_debug(), which
  sets MDCR_EL2_TPMS in the per-vcpu shadow mdscr_el2, which gets
  applied on entry by __activate_traps_common().

The early setting of MDSCR_EL2.TPMS is therefore useless and can
be dropped.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/el2_setup.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/include/asm/el2_setup.h 
b/arch/arm64/include/asm/el2_setup.h
index 540116de80bf..56c9e1cef180 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -66,9 +66,6 @@
mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
orr x2, x2, x0  // If we don't have VHE, then
// use EL1&0 translation.
-.else
-   orr x2, x2, #MDCR_EL2_TPMS  // For VHE, use EL2 translation
-   // and disable access from EL1
 .endif
 
 .Lskip_spe_\@:
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 05/23] arm64: Initialise as nVHE before switching to VHE

2021-02-08 Thread Marc Zyngier
As we are aiming to be able to control whether we enable VHE or
not, let's always drop down to EL1 first, and only then upgrade
to VHE if at all possible.

This means that if the kernel is booted at EL2, we always start
with a nVHE init, drop to EL1 to initialise the the kernel, and
only then upgrade the kernel EL to EL2 if possible (the process
is obviously shortened for secondary CPUs).

The resume path is handled similarly to a secondary CPU boot.

Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
Acked-by: Catalin Marinas 
---
 arch/arm64/kernel/head.S | 38 ++--
 arch/arm64/kernel/hyp-stub.S | 24 +++
 arch/arm64/kernel/sleep.S|  1 +
 3 files changed, 27 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 28e9735302df..07445fd976ef 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -433,6 +433,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
bl  __pi_memset
dsb ishst   // Make zero page visible to PTW
 
+   bl  switch_to_vhe
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
bl  kasan_early_init
 #endif
@@ -493,42 +494,6 @@ SYM_INNER_LABEL(init_el1, SYM_L_LOCAL)
eret
 
 SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
-#ifdef CONFIG_ARM64_VHE
-   /*
-* Check for VHE being present. x2 being non-zero indicates that we
-* do have VHE, and that the kernel is intended to run at EL2.
-*/
-   mrs x2, id_aa64mmfr1_el1
-   ubfxx2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
-#else
-   mov x2, xzr
-#endif
-   cbz x2, init_el2_nvhe
-
-   /*
-* When VHE _is_ in use, EL1 will not be used in the host and
-* requires no configuration, and all non-hyp-specific EL2 setup
-* will be done via the _EL1 system register aliases in __cpu_setup.
-*/
-   mov_q   x0, HCR_HOST_VHE_FLAGS
-   msr hcr_el2, x0
-   isb
-
-   init_el2_state vhe
-
-   isb
-
-   mov_q   x0, INIT_PSTATE_EL2
-   msr spsr_el2, x0
-   msr elr_el2, lr
-   mov w0, #BOOT_CPU_MODE_EL2
-   eret
-
-SYM_INNER_LABEL(init_el2_nvhe, SYM_L_LOCAL)
-   /*
-* When VHE is not in use, early init of EL2 and EL1 needs to be
-* done here.
-*/
mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
msr sctlr_el1, x0
 
@@ -623,6 +588,7 @@ SYM_FUNC_START_LOCAL(secondary_startup)
/*
 * Common entry point for secondary CPUs.
 */
+   bl  switch_to_vhe
bl  __cpu_secondary_check52bitva
bl  __cpu_setup // initialise processor
adrpx1, swapper_pg_dir
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 3f3dbbe8914d..373ed2213e1d 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -190,3 +190,27 @@ SYM_FUNC_START(__hyp_reset_vectors)
hvc #0
ret
 SYM_FUNC_END(__hyp_reset_vectors)
+
+/*
+ * Entry point to switch to VHE if deemed capable
+ */
+SYM_FUNC_START(switch_to_vhe)
+#ifdef CONFIG_ARM64_VHE
+   // Need to have booted at EL2
+   adr_l   x1, __boot_cpu_mode
+   ldr w0, [x1]
+   cmp w0, #BOOT_CPU_MODE_EL2
+   b.ne1f
+
+   // and still be at EL1
+   mrs x0, CurrentEL
+   cmp x0, #CurrentEL_EL1
+   b.ne1f
+
+   // Turn the world upside down
+   mov x0, #HVC_VHE_RESTART
+   hvc #0
+1:
+#endif
+   ret
+SYM_FUNC_END(switch_to_vhe)
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 6bdef7362c0e..5bfd9b87f85d 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -100,6 +100,7 @@ SYM_FUNC_END(__cpu_suspend_enter)
.pushsection ".idmap.text", "awx"
 SYM_CODE_START(cpu_resume)
bl  init_kernel_el
+   bl  switch_to_vhe
bl  __cpu_setup
/* enable the MMU early - so we can access sleep_save_stash by va */
adrpx1, swapper_pg_dir
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 04/23] arm64: Provide an 'upgrade to VHE' stub hypercall

2021-02-08 Thread Marc Zyngier
As we are about to change the way a VHE system boots, let's
provide the core helper, in the form of a stub hypercall that
enables VHE and replicates the full EL1 context at EL2, thanks
to EL1 and VHE-EL2 being extremely similar.

On exception return, the kernel carries on at EL2. Fancy!

Nothing calls this new hypercall yet, so no functional change.

Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
Acked-by: Catalin Marinas 
---
 arch/arm64/include/asm/virt.h |  7 +++-
 arch/arm64/kernel/hyp-stub.S  | 76 ++-
 2 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index ee6a48df89d9..7379f35ae2c6 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -35,8 +35,13 @@
  */
 #define HVC_RESET_VECTORS 2
 
+/*
+ * HVC_VHE_RESTART - Upgrade the CPU from EL1 to EL2, if possible
+ */
+#define HVC_VHE_RESTART3
+
 /* Max number of HYP stub hypercalls */
-#define HVC_STUB_HCALL_NR 3
+#define HVC_STUB_HCALL_NR 4
 
 /* Error returned when an invalid stub number is passed into x0 */
 #define HVC_STUB_ERR   0xbadca11
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 160f5881a0b7..3f3dbbe8914d 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -8,9 +8,9 @@
 
 #include 
 #include 
-#include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -47,10 +47,13 @@ SYM_CODE_END(__hyp_stub_vectors)
 
 SYM_CODE_START_LOCAL(el1_sync)
cmp x0, #HVC_SET_VECTORS
-   b.ne2f
+   b.ne1f
msr vbar_el2, x1
b   9f
 
+1: cmp x0, #HVC_VHE_RESTART
+   b.eqmutate_to_vhe
+
 2: cmp x0, #HVC_SOFT_RESTART
b.ne3f
mov x0, x2
@@ -70,6 +73,75 @@ SYM_CODE_START_LOCAL(el1_sync)
eret
 SYM_CODE_END(el1_sync)
 
+// nVHE? No way! Give me the real thing!
+SYM_CODE_START_LOCAL(mutate_to_vhe)
+   // Be prepared to fail
+   mov_q   x0, HVC_STUB_ERR
+
+   // Sanity check: MMU *must* be off
+   mrs x1, sctlr_el2
+   tbnzx1, #0, 1f
+
+   // Needs to be VHE capable, obviously
+   mrs x1, id_aa64mmfr1_el1
+   ubfxx1, x1, #ID_AA64MMFR1_VHE_SHIFT, #4
+   cbz x1, 1f
+
+   // Engage the VHE magic!
+   mov_q   x0, HCR_HOST_VHE_FLAGS
+   msr hcr_el2, x0
+   isb
+
+   // Doesn't do much on VHE, but still, worth a shot
+   init_el2_state vhe
+
+   // Use the EL1 allocated stack, per-cpu offset
+   mrs x0, sp_el1
+   mov sp, x0
+   mrs x0, tpidr_el1
+   msr tpidr_el2, x0
+
+   // FP configuration, vectors
+   mrs_s   x0, SYS_CPACR_EL12
+   msr cpacr_el1, x0
+   mrs_s   x0, SYS_VBAR_EL12
+   msr vbar_el1, x0
+
+   // Transfer the MM state from EL1 to EL2
+   mrs_s   x0, SYS_TCR_EL12
+   msr tcr_el1, x0
+   mrs_s   x0, SYS_TTBR0_EL12
+   msr ttbr0_el1, x0
+   mrs_s   x0, SYS_TTBR1_EL12
+   msr ttbr1_el1, x0
+   mrs_s   x0, SYS_MAIR_EL12
+   msr mair_el1, x0
+   isb
+
+   // Invalidate TLBs before enabling the MMU
+   tlbivmalle1
+   dsb nsh
+
+   // Enable the EL2 S1 MMU, as set up from EL1
+   mrs_s   x0, SYS_SCTLR_EL12
+   set_sctlr_el1   x0
+
+   // Disable the EL1 S1 MMU for a good measure
+   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
+   msr_s   SYS_SCTLR_EL12, x0
+
+   // Hack the exception return to stay at EL2
+   mrs x0, spsr_el1
+   and x0, x0, #~PSR_MODE_MASK
+   mov x1, #PSR_MODE_EL2h
+   orr x0, x0, x1
+   msr spsr_el1, x0
+
+   mov x0, xzr
+
+1: eret
+SYM_CODE_END(mutate_to_vhe)
+
 .macro invalid_vector  label
 SYM_CODE_START_LOCAL(\label)
b \label
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 03/23] arm64: Turn the MMU-on sequence into a macro

2021-02-08 Thread Marc Zyngier
Turning the MMU on is a popular sport in the arm64 kernel, and
we do it more than once, or even twice. As we are about to add
even more, let's turn it into a macro.

No expected functional change.

Signed-off-by: Marc Zyngier 
Acked-by: Catalin Marinas 
Acked-by: David Brazdil 
---
 arch/arm64/include/asm/assembler.h | 17 +
 arch/arm64/kernel/head.S   | 19 ---
 arch/arm64/mm/proc.S   | 12 +---
 3 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index bf125c591116..8cded93f99c3 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -675,6 +675,23 @@ USER(\label, icivau, \tmp2)// 
invalidate I line PoU
.endif
.endm
 
+/*
+ * Set SCTLR_EL1 to the passed value, and invalidate the local icache
+ * in the process. This is called when setting the MMU on.
+ */
+.macro set_sctlr_el1, reg
+   msr sctlr_el1, \reg
+   isb
+   /*
+* Invalidate the local I-cache so that any instructions fetched
+* speculatively from the PoC are discarded, since they may have
+* been dynamically patched at the PoU.
+*/
+   ic  iallu
+   dsb nsh
+   isb
+.endm
+
 /*
  * Check whether to yield to another runnable task from kernel mode NEON code
  * (which runs with preemption disabled).
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index a0dc987724ed..28e9735302df 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -703,16 +703,9 @@ SYM_FUNC_START(__enable_mmu)
offset_ttbr1 x1, x3
msr ttbr1_el1, x1   // load TTBR1
isb
-   msr sctlr_el1, x0
-   isb
-   /*
-* Invalidate the local I-cache so that any instructions fetched
-* speculatively from the PoC are discarded, since they may have
-* been dynamically patched at the PoU.
-*/
-   ic  iallu
-   dsb nsh
-   isb
+
+   set_sctlr_el1   x0
+
ret
 SYM_FUNC_END(__enable_mmu)
 
@@ -883,11 +876,7 @@ SYM_FUNC_START_LOCAL(__primary_switch)
tlbivmalle1 // Remove any stale TLB entries
dsb nsh
 
-   msr sctlr_el1, x19  // re-enable the MMU
-   isb
-   ic  iallu   // flush instructions fetched
-   dsb nsh // via old mapping
-   isb
+   set_sctlr_el1   x19 // re-enable the MMU
 
bl  __relocate_kernel
 #endif
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index ece785477bdc..c967bfd30d2b 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -291,17 +291,7 @@ skip_pgd:
/* We're done: fire up the MMU again */
mrs x17, sctlr_el1
orr x17, x17, #SCTLR_ELx_M
-   msr sctlr_el1, x17
-   isb
-
-   /*
-* Invalidate the local I-cache so that any instructions fetched
-* speculatively from the PoC are discarded, since they may have
-* been dynamically patched at the PoU.
-*/
-   ic  iallu
-   dsb nsh
-   isb
+   set_sctlr_el1   x17
 
/* Set the flag to zero to indicate that we're all done */
str wzr, [flag_ptr]
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 02/23] arm64: Fix outdated TCR setup comment

2021-02-08 Thread Marc Zyngier
The arm64 kernel has long be able to use more than 39bit VAs.
Since day one, actually. Let's rewrite the offending comment.

Signed-off-by: Marc Zyngier 
Acked-by: Catalin Marinas 
Acked-by: David Brazdil 
---
 arch/arm64/mm/proc.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 1f7ee8c8b7b8..ece785477bdc 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -464,8 +464,8 @@ SYM_FUNC_START(__cpu_setup)
 #endif
msr mair_el1, x5
/*
-* Set/prepare TCR and TTBR. We use 512GB (39-bit) address range for
-* both user and kernel.
+* Set/prepare TCR and TTBR. TCR_EL1.T1SZ gets further
+* adjusted if the kernel is compiled with 52bit VA support.
 */
mov_q   x10, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7 00/23] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth

2021-02-08 Thread Marc Zyngier
It recently came to light that there is a need to be able to override
some CPU features very early on, before the kernel is fully up and
running. The reasons for this range from specific feature support
(such as using Protected KVM on VHE HW, which is the main motivation
for this work) to errata workaround (a feature is broken on a CPU and
needs to be turned off, or rather not enabled).

This series tries to offer a limited framework for this kind of
problems, by allowing a set of options to be passed on the
command-line and altering the feature set that the cpufeature
subsystem exposes to the rest of the kernel. Note that this doesn't
change anything for code that directly uses the CPU ID registers.

The series completely changes the way a VHE-capable system boots, by
*always* booting non-VHE first, and then upgrading to VHE when deemed
capable. Although it sounds scary, this is actually simple to
implement (and I wish I had done that five years ago). The "upgrade to
VHE" path is then conditioned on the VHE feature not being disabled
from the command-line.

Said command-line parsing borrows a lot from the kaslr code, and
subsequently allows the "nokaslr" option to be moved to the new
infrastructure (though it all looks a bit... odd).

Further patches now add support for disabling BTI and PAuth, the
latter being based on an initial series by Srinivas Ramana[0]. There
is some ongoing discussions about being able to disable MTE, but no
clear resolution on that subject yet

WARNING: this series breaks Apple M1 badly, as it is stuck in VHE
mode. The last patch in this series papers over the problem, but it
*isn't* a candidate for merging yet.

This has been tested on multiple VHE and non-VHE systems.

Branch available at [7].

* From v6 [6]:
  - Greatly simplify SPE setup with VHE
  - Simplify option parsing by reusing some of the helpers user by
parse_args(). The whole function cannot be used though, as it
does things that can't be done at the point where we parse the
overrides.
  - Add a patch allowing M1 CPUs to boot. This patch shouldn't be
merged until we decide to support this non-architectural behaviour.

* From v5 [5]:
  - Turn most __initdata into __initconst
  - Ensure that all strings are part of the __initconst section.
This is a bit ugly, but saves memory once up and running
  - Make overrides __ro_after_init
  - Change the command-line parsing so that the same feature can
be overridden multiple times, with the expected left-to-right
parsing order being respected
  - Handle all space-like characters as option delimiters
  - Collected Acks, RBs and TBs

* From v4 [4]:
  - Documentation fixes
  - Moved the val/mask pair into a arm64_ftr_override structure,
leading to simpler code
  - All arm64_ftr_reg now have a default override, which simplifies
the code a bit further
  - Dropped some of the "const" attributes
  - Renamed init_shadow_regs() to init_feature_override()
  - Renamed struct reg_desc to struct ftr_set_desc
  - Refactored command-line parsing
  - Simplified handling of VHE being disabled on the cmdline
  - Turn EL1 S1 MMU off on switch to VHE
  - HVC_VHE_RESTART now returns an error code on failure
  - Added missing asmlinkage and dummy prototypes
  - Collected Acks and RBs from David, Catalin and Suzuki

* From v3 [3]:
  - Fixed the VHE_RESTART stub (duh!)
  - Switched to using arm64_ftr_safe_value() instead of the user
provided value
  - Per-feature override warning

* From v2 [2]:
  - Simplify the VHE_RESTART stub
  - Fixed a number of spelling mistakes, and hopefully introduced a
few more
  - Override features in __read_sysreg_by_encoding()
  - Allow both BTI and PAuth to be overridden on the command line
  - Rebased on -rc3

* From v1 [1]:
  - Fix SPE init on VHE when EL2 doesn't own SPE
  - Fix re-init when KASLR is used
  - Handle the resume path
  - Rebased to 5.11-rc2

[0] 
https://lore.kernel.org/r/1610152163-16554-1-git-send-email-sram...@codeaurora.org
[1] https://lore.kernel.org/r/20201228104958.1848833-1-...@kernel.org
[2] https://lore.kernel.org/r/20210104135011.2063104-1-...@kernel.org
[3] https://lore.kernel.org/r/2021032811.2455113-1-...@kernel.org
[4] https://lore.kernel.org/r/20210118094533.2874082-1-...@kernel.org
[5] https://lore.kernel.org/r/20210125105019.2946057-1-...@kernel.org
[6] https://lore.kernel.org/r/20210201115637.3123740-1-...@kernel.org
[7] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/arm64-early-cpufeature

Marc Zyngier (22):
  arm64: Fix labels in el2_setup macros
  arm64: Fix outdated TCR setup comment
  arm64: Turn the MMU-on sequence into a macro
  arm64: Provide an 'upgrade to VHE' stub hypercall
  arm64: Initialise as nVHE before switching to VHE
  arm64: Drop early setting of MDSCR_EL2.TPMS
  arm64: Move VHE-specific SPE setup to mutate_to_vhe()
  arm64: Simplify init_el2_state to be non-VHE only
  arm64: Move SCTLR_EL1 initialisation to EL-agnostic code
  arm6

[PATCH v7 01/23] arm64: Fix labels in el2_setup macros

2021-02-08 Thread Marc Zyngier
If someone happens to write the following code:

b   1f
init_el2_state  vhe
1:
[...]

they will be in for a long debugging session, as the label "1f"
will be resolved *inside* the init_el2_state macro instead of
after it. Not really what one expects.

Instead, rewite the EL2 setup macros to use unambiguous labels,
thanks to the usual macro counter trick.

Acked-by: Catalin Marinas 
Signed-off-by: Marc Zyngier 
Acked-by: David Brazdil 
---
 arch/arm64/include/asm/el2_setup.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/el2_setup.h 
b/arch/arm64/include/asm/el2_setup.h
index a7f5a1bbc8ac..540116de80bf 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -45,24 +45,24 @@
mrs x1, id_aa64dfr0_el1
sbfxx0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
cmp x0, #1
-   b.lt1f  // Skip if no PMU present
+   b.lt.Lskip_pmu_\@   // Skip if no PMU present
mrs x0, pmcr_el0// Disable debug access traps
ubfxx0, x0, #11, #5 // to EL2 and allow access to
-1:
+.Lskip_pmu_\@:
cselx2, xzr, x0, lt // all PMU counters from EL1
 
/* Statistical profiling */
ubfxx0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
-   cbz x0, 3f  // Skip if SPE not present
+   cbz x0, .Lskip_spe_\@   // Skip if SPE not present
 
 .ifeqs "\mode", "nvhe"
mrs_s   x0, SYS_PMBIDR_EL1  // If SPE available at EL2,
and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
-   cbnzx0, 2f  // then permit sampling of 
physical
+   cbnzx0, .Lskip_spe_el2_\@   // then permit sampling of 
physical
mov x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
  1 << SYS_PMSCR_EL2_PA_SHIFT)
msr_s   SYS_PMSCR_EL2, x0   // addresses and physical 
counter
-2:
+.Lskip_spe_el2_\@:
mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
orr x2, x2, x0  // If we don't have VHE, then
// use EL1&0 translation.
@@ -71,7 +71,7 @@
// and disable access from EL1
 .endif
 
-3:
+.Lskip_spe_\@:
msr mdcr_el2, x2// Configure debug traps
 .endm
 
@@ -79,9 +79,9 @@
 .macro __init_el2_lor
mrs x1, id_aa64mmfr1_el1
ubfxx0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
-   cbz x0, 1f
+   cbz x0, .Lskip_lor_\@
msr_s   SYS_LORC_EL1, xzr
-1:
+.Lskip_lor_\@:
 .endm
 
 /* Stage-2 translation */
@@ -93,7 +93,7 @@
 .macro __init_el2_gicv3
mrs x0, id_aa64pfr0_el1
ubfxx0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
-   cbz x0, 1f
+   cbz x0, .Lskip_gicv3_\@
 
mrs_s   x0, SYS_ICC_SRE_EL2
orr x0, x0, #ICC_SRE_EL2_SRE// Set ICC_SRE_EL2.SRE==1
@@ -103,7 +103,7 @@
mrs_s   x0, SYS_ICC_SRE_EL2 // Read SRE back,
tbz x0, #0, 1f  // and check that it sticks
msr_s   SYS_ICH_HCR_EL2, xzr// Reset ICC_HCR_EL2 to defaults
-1:
+.Lskip_gicv3_\@:
 .endm
 
 .macro __init_el2_hstr
@@ -128,14 +128,14 @@
 .macro __init_el2_nvhe_sve
mrs x1, id_aa64pfr0_el1
ubfxx1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
-   cbz x1, 1f
+   cbz x1, .Lskip_sve_\@
 
bic x0, x0, #CPTR_EL2_TZ// Also disable SVE traps
msr cptr_el2, x0// Disable copro. traps to EL2
isb
mov x1, #ZCR_ELx_LEN_MASK   // SVE: Enable full vector
msr_s   SYS_ZCR_EL2, x1 // length for EL1.
-1:
+.Lskip_sve_\@:
 .endm
 
 .macro __init_el2_nvhe_prepare_eret
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm