This adds document to explain the interface for asynchronous page
fault and how it works in general.
Signed-off-by: Gavin Shan
---
Documentation/virt/kvm/arm/apf.rst | 143 +++
Documentation/virt/kvm/arm/index.rst | 1 +
2 files changed, 144 insertions(+)
create mode
This enables asynchronous page fault from guest side. The design
is highlighted as below:
* The per-vCPU shared memory region, which is represented by
"struct kvm_vcpu_pv_apf_data", is allocated. The reason and
token associated with the received notifications of asynchronous
page
The page-not-present notification is delivered by SDEI event. The
guest reschedules current process to another one when the SDEI event
is received. It's not safe to do so in the SDEI event handler because
the SDEI event should be acknowledged as soon as possible.
So the rescheduling is postponed u
This implements kvm_para_available() to check if para-virtualization
features are available or not. Besides, kvm_para_has_feature() is
enhanced to detect the asynchronous page fault para-virtualization
feature. These two functions are going to be used by guest kernel
to enable the asynchronous page
This exports the asynchronous page fault capability:
* Identify capability KVM_CAP_ASYNC_{PF, PF_INT}.
* Standardize SDEI event for asynchronous page fault.
* Enable kernel config CONFIG_KVM_ASYNC_{PF, PF_SLOT}.
Signed-off-by: Gavin Shan
---
arch/arm64/include/uapi/asm/kvm_sdei.h
This introduces (SMCCC) KVM vendor specific services to configure
the asynchronous page fault functionality. The following services
are introduced:
* ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION
Returns the version, which can be used to identify ABI changes
in the future.
* ARM_SMCCC_KVM_FU
This supports ioctl commands for configuration and migration:
KVM_ARM_ASYNC_PF_CMD_GET_VERSION
Return implementation version
KVM_ARM_ASYNC_PF_CMD_GET_SDEI
Return SDEI event number used for page-not-present notification
KVM_ARM_ASYNC_PF_CMD_GET_IRQ
Return IRQ number used
The asynchronous page fault starts with a worker when the requested
page isn't present. The worker makes the requested page present
in the background and the worker, together with the associated
information, is queued to the completion queue after that. The
worker and the completion queue are check
The requested page might be not resident in memory during the stage-2
page fault. For example, the requested page could be resident in swap
device (file). In this case, disk I/O is issued in order to fetch the
requested page and it could take tens of milliseconds, even hundreds
of milliseconds in e
We need put more stuff in the paravirtualization header files when
the asynchronous page fault is supported. The generic header files
can't meet the goal. This duplicate the generic header files to be
our platform specific header files. It's the preparatory work to
support the asynchronous page fau
The main work is handled by user_mem_abort(). After asynchronous
page fault is supported, one page fault need to be handled with
two calls to this function. It means the page fault needs to be
replayed asynchronously in that case. This renames the function
to kvm_handle_user_mem_abort() can exports
From: Will Deacon
We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.
Signed-off-by: Will Deacon
Signed-off-by: Gavin Shan
---
arch/arm64/kvm/hypercalls.c | 27 ++-
1 file changed, 18 insert
From: Will Deacon
Although the SMCCC specification provides some limited functionality for
describing the presence of hypervisor and firmware services, this is
generally applicable only to functions designated as "Arm Architecture
Service Functions" and no portable discovery mechanism is provided
This uses the generic slot management mechanism for asynchronous
page fault by enabling CONFIG_KVM_ASYNC_PF_SLOT because the private
implementation is totally duplicate to the generic one.
The changes introduced by this is pretty mechanical and shouldn't
cause any logical changes.
Signed-off-by:
It's not allowed to fire duplicate notification for same GFN on
x86 platform, with help of a hash table. This mechanism is going
to be used by arm64 and this makes the code generic and shareable
by multiple platforms.
* As this mechanism isn't needed by all platforms, a new kernel
config o
This moves the definitions of "struct kvm_async_pf" and the related
functions after "struct kvm_vcpu" so that newly added inline function
can dereference "struct kvm_vcpu" properly. Otherwise, the unexpected
build error will be raised:
error: dereferencing pointer to incomplete type ‘struct kvm
This adds inline function kvm_check_async_pf_completion_queue()
and stub on !CONFIG_KVM_ASYNC_PF so that the source code won't
have to care about CONFIG_KVM_ASYNC_PF. The kernel option is
used for once in kvm_main.c and it can be removed then. Besides,
the checks on the completion queue are all rep
There are two stages of page fault. The guest kernel is responsible for
handling stage-1 page fault, while the host kernel is to take care of the
stage-2 page fault. When the guest is trapped to host because of stage-2
page fault, the guest is suspended until the requested memory (page) is
populate
This adds SDEI test case into selftests where the various hypercalls
are issued to kvm private event (0x4020) and then ensure that's
completed without error. Note that two vCPUs are started up by default
to run same consequence. Actually, it's simulating what SDEI client
driver does and the fol
The SDEI functionality is ready to be exported so far. This adds
new capability (KVM_CAP_ARM_SDEI) and exports it.
Signed-off-by: Gavin Shan
---
arch/arm64/kvm/arm.c | 3 +++
include/uapi/linux/kvm.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/
The injected SDEI event is to send notification to guest. The SDEI
event might not be needed after it's injected. This introduces API
to support cancellation on the injected SDEI event if it's not fired
to the guest yet.
This mechanism will be needed when we're going to support asynchronous
page f
This supports SDEI event injection by implementing kvm_sdei_inject().
It's called by kernel directly or VMM through ioctl command to inject
SDEI event to the specific vCPU.
Signed-off-by: Gavin Shan
---
arch/arm64/include/asm/kvm_sdei.h | 2 +
arch/arm64/include/uapi/asm/kvm_sdei.h | 1
This supports ioctl commands on vCPU to manage the various object.
It's primarily used by VMM to accomplish live migration. The ioctl
commands introduced by this are highlighted as below:
* KVM_SDEI_CMD_GET_VEVENT_COUNT
Retrieve number of SDEI events that pend for handling on the
vCPU
This supports ioctl commands on VM to manage the various objects.
It's primarily used by VMM to accomplish live migration. The ioctl
commands introduced by this are highlighted as blow:
* KVM_SDEI_CMD_GET_VERSION
Retrieve the version of current implementation
* KVM_SDEI_CMD_SET_EVENT
The owner of the SDEI event, like asynchronous page fault, need
know the state of injected SDEI event. This supports SDEI event
state updating by introducing notifier mechanism. It's notable
the notifier (handler) should be capable of migration.
Signed-off-by: Gavin Shan
---
arch/arm64/include/a
This supports SDEI_EVENT_{COMPLETE, COMPLETE_AND_RESUME} hypercall.
They are used by the guest to notify the completion of the SDEI
event in the handler. The registers are changed according to the
SDEI specification as below:
* x0 - x17, PC and PState are restored to what values we had in
This implement kvm_sdei_deliver() to support SDEI event delivery.
The function is called when the request (KVM_REQ_SDEI) is raised.
The following rules are taken according to the SDEI specification:
* x0 - x17 are saved. All of them are cleared except the following
registered:
x0: num
This supports SDEI_{PRIVATE, SHARED}_RESET. They are used by the
guest to purge the private or shared SDEI events, which are registered
previously.
Signed-off-by: Gavin Shan
---
arch/arm64/kvm/sdei.c | 29 +
1 file changed, 29 insertions(+)
diff --git a/arch/arm64/kv
This supports SDEI_PE_{MASK, UNMASK} hypercall. They are used by
the guest to stop the specific vCPU from receiving SDEI events.
Signed-off-by: Gavin Shan
---
arch/arm64/kvm/sdei.c | 35 +++
1 file changed, 35 insertions(+)
diff --git a/arch/arm64/kvm/sdei.c b/ar
This supports SDEI_EVENT_ROUTING_SET hypercall. It's used by the
guest to set route mode and affinity for the registered KVM event.
It's only valid for the shared events. It's not allowed to do so
when the corresponding event has been raised to the guest.
Signed-off-by: Gavin Shan
---
arch/arm64
This supports SDEI_EVENT_GET_INFO hypercall. It's used by the guest
to retrieve various information about the supported (exported) events,
including type, signaled, route mode and affinity for the shared
events.
Signed-off-by: Gavin Shan
---
arch/arm64/kvm/sdei.c | 76 +++
This supports SDEI_EVENT_STATUS hypercall. It's used by the guest
to retrieve a bitmap to indicate the SDEI event states, including
registration, enablement and delivery state.
Signed-off-by: Gavin Shan
---
arch/arm64/kvm/sdei.c | 50 +++
1 file changed, 5
This supports SDEI_EVENT_UNREGISTER hypercall. It's used by the
guest to unregister SDEI event. The SDEI event won't be raised to
the guest or specific vCPU after it's unregistered successfully.
It's notable the SDEI event is disabled automatically on the guest
or specific vCPU once it's unregister
This supports SDEI_EVENT_CONTEXT hypercall. It's used by the guest
to retrieved the original registers (R0 - R17) in its SDEI event
handler. Those registers can be corrupted during the SDEI event
delivery.
Signed-off-by: Gavin Shan
---
arch/arm64/kvm/sdei.c | 40 +
This supports SDEI_EVENT_{ENABLE, DISABLE} hypercall. After SDEI
event is registered by guest, it won't be delivered to the guest
until it's enabled. On the other hand, the SDEI event won't be
raised to the guest or specific vCPU if it's has been disabled
on the guest or specific vCPU.
Signed-off-
This supports SDEI_EVENT_REGISTER hypercall, which is used by guest
to register SDEI events. The SDEI event won't be raised to the guest
or specific vCPU until it's registered and enabled explicitly.
Only those events that have been exported by KVM can be registered.
After the event is registered
This supports SDEI_VERSION hypercall by returning v1.0.0 simply
when the functionality is supported on the VM and vCPU.
Signed-off-by: Gavin Shan
---
arch/arm64/kvm/sdei.c | 18 ++
1 file changed, 18 insertions(+)
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index
Software Delegated Exception Interface (SDEI) provides a mechanism for
registering and servicing system events. Those system events are high
priority events, which must be serviced immediately. It's going to be
used by Asynchronous Page Fault (APF) to deliver notification from KVM
to guest. It's no
The inline functions used to get the SMCCC parameters have same
layout. It means the logical functionality can be presented by
a template, to make the code simplified. Besides, this adds more
similar inline functions like smccc_get_arg{4,5,6,7,8}() to visit
more SMCCC arguments, which are required
This series intends to virtualize Software Delegated Exception Interface
(SDEI), which is defined by DEN0054A. It allows the hypervisor to deliver
NMI-alike event to guest and it's needed by asynchronous page fault to
deliver page-not-present notification from hypervisor to guest. The code
and the
On Fri, 5 Feb 2021 at 13:58, Steven Price wrote:
>
> The VMM may not wish to have it's own mapping of guest memory mapped
> with PROT_MTE because this causes problems if the VMM has tag checking
> enabled (the guest controls the tags in physical RAM and it's unlikely
> the tags are correct for the
On 2021-02-08 14:32, Will Deacon wrote:
Hi Marc,
On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:
It recently came to light that there is a need to be able to override
some CPU features very early on, before the kernel is fully up and
running. The reasons for this range from specif
Hi Will,
On 2021-02-08 14:32, Will Deacon wrote:
Hi Marc,
On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:
It recently came to light that there is a need to be able to override
some CPU features very early on, before the kernel is fully up and
running. The reasons for this range f
On Fri, Feb 05, 2021 at 12:12:51AM +, Daniel Kiss wrote:
>
>
> > On 4 Feb 2021, at 18:36, Dave Martin wrote:
> >
> > On Tue, Feb 02, 2021 at 07:52:54PM +0100, Daniel Kiss wrote:
> >> CPUs that support SVE are architecturally required to support the
> >> Virtualization Host Extensions (VHE),
On Mon, 8 Feb 2021 at 15:32, Will Deacon wrote:
>
> Hi Marc,
>
> On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:
> > It recently came to light that there is a need to be able to override
> > some CPU features very early on, before the kernel is fully up and
> > running. The reasons f
Hi Marc,
On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:
> It recently came to light that there is a need to be able to override
> some CPU features very early on, before the kernel is fully up and
> running. The reasons for this range from specific feature support
> (such as using P
From: Jianyong Wu
Implement the hypervisor side of the KVM PTP interface.
The service offers wall time and cycle count from host to guest.
The caller must specify whether they want the host's view of
either the virtual or physical counter.
Signed-off-by: Jianyong Wu
Signed-off-by: Marc Zyngier
From: Jianyong Wu
Currently, there is no mechanism to keep time sync between guest and host
in arm/arm64 virtualization environment. Time in guest will drift compared
with host after boot up as they may both use third party time sources
to correct their time respectively. The time deviation will
From: Jianyong Wu
Add clocksource id to the ARM generic counter so that it can be easily
identified from callers such as ptp_kvm.
Cc: Mark Rutland
Signed-off-by: Jianyong Wu
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20201209060932.212364-6-jianyong...@arm.com
---
drivers/cl
From: Thomas Gleixner
System time snapshots are not conveying information about the current
clocksource which was used, but callers like the PTP KVM guest
implementation have the requirement to evaluate the clocksource type to
select the appropriate mechanism.
Introduce a clocksource id field in
From: Jianyong Wu
Currently, the ptp_kvm module contains a lot of x86-specific code.
Let's move this code into a new arch-specific file in the same directory,
and rename the arch-independent file to ptp_kvm_common.c.
Signed-off-by: Jianyong Wu
Signed-off-by: Marc Zyngier
Link: https://lore.ker
From: Will Deacon
We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.
Cc: Marc Zyngier
Signed-off-by: Will Deacon
Signed-off-by: Jianyong Wu
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/202012090609
Given that this series[0] has languished in my Inbox for the best of the
past two years, and in an effort to eventually get it merged, I've
taken the liberty to pick it up and do the changes I wanted to see
instead of waiting to go through yet another round.
All the patches have a link to their or
From: Will Deacon
Although the SMCCC specification provides some limited functionality for
describing the presence of hypervisor and firmware services, this is
generally applicable only to functions designated as "Arm Architecture
Service Functions" and no portable discovery mechanism is provided
Hi Eric,
On 2/5/21 1:30 PM, Auger Eric wrote:
> Hi Alexandru,
>
> On 1/29/21 5:36 PM, Alexandru Elisei wrote:
>> The LPI code validates a result similarly to the IPI tests, by checking if
>> the target CPU received the interrupt with the expected interrupt number.
>> However, the LPI tests invent
With a guest translation fault, the memcache pages are not needed if KVM
is only about to install a new leaf entry into the existing page table.
And with a guest permission fault, the memcache pages are also not needed
for a write_fault in dirty-logging time if KVM is only about to update
the exist
Process of coalescing page mappings back to a block mapping is different
from normal map path, such as TLB invalidation and CMOs, so here add an
independent API for this case.
Signed-off-by: Yanan Wang
---
arch/arm64/kvm/hyp/pgtable.c | 18 ++
1 file changed, 18 insertions(+)
di
When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.
It will cost a long time to unmap the numerous page mappings, which means
Hi,
This series makes some efficiency improvement of stage2 page table code,
and there are some test results to present the performance changes, which
were tested by a kvm selftest [1] that I have post:
[1]
https://lore.kernel.org/lkml/20210208090841.333724-1-wangyana...@huawei.com/
About patch
We currently uniformly clean dcache in user_mem_abort() before calling the
fault handlers, if we take a translation fault and the pfn is cacheable.
But if there are concurrent translation faults on the same page or block,
clean of dcache for the first time is necessary while the others are not.
By
It seems that the CPU known as Apple M1 has the terrible habit
of being stuck with HCR_EL2.E2H==1, in violation of the architecture.
Try and work around this deplorable state of affairs by detecting
the stuck bit early and short-circuit the nVHE dance. It is still
unknown whether there are many mo
As we want to parse more options very early in the kernel lifetime,
let's always map the FDT early. This is achieved by moving that
code out of kaslr_early_init().
No functionnal change expected.
Signed-off-by: Marc Zyngier
Acked-by: Catalin Marinas
Acked-by: David Brazdil
---
arch/arm64/incl
Finally we can check whether VHE is disabled on the command line,
and not enable it if that's the user's wish.
Signed-off-by: Marc Zyngier
Acked-by: David Brazdil
Acked-by: Catalin Marinas
---
arch/arm64/kernel/asm-offsets.c | 3 +++
arch/arm64/kernel/hyp-stub.S| 11 +++
2 files c
From: Srinivas Ramana
Defer enabling pointer authentication on boot core until
after its required to be enabled by cpufeature framework.
This will help in controlling the feature dynamically
with a boot parameter.
Signed-off-by: Ajay Patil
Signed-off-by: Prasad Sodagudi
Signed-off-by: Srinivas
Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't
that easy to understand, and it is likely that users would much
prefer write "kvm-arm.mode=nvhe", or "...=protected".
So here you go. This has the added advantage that we can now
always honor the "kvm-arm.mode=protected" option, even w
In order to be able to override CPU features at boot time,
let's add a command line parser that matches options of the
form "cpureg.feature=value", and store the corresponding
value into the override val/mask pair.
No features are currently defined, so no expected change in
functionality.
Signed-
Add a facility to globally override a feature, no matter what
the HW says. Yes, this sounds dangerous, but we do respect the
"safe" value for a given feature. This doesn't mean the user
doesn't need to know what they are doing.
Nothing uses this yet, so we are pretty safe. For now.
Signed-off-by:
Given that the early cpufeature infrastructure has borrowed quite
a lot of code from the kaslr implementation, let's reimplement
the matching of the "nokaslr" option with it.
Signed-off-by: Marc Zyngier
Acked-by: Catalin Marinas
Acked-by: David Brazdil
---
arch/arm64/kernel/idreg-override.c |
As we want to be able to disable VHE at runtime, let's match
"id_aa64mmfr1.vh=" from the command line as an override.
This doesn't have much effect yet as our boot code doesn't look
at the cpufeature, but only at the HW registers.
Signed-off-by: Marc Zyngier
Acked-by: David Brazdil
Acked-by: Suz
__read_sysreg_by_encoding() is used by a bunch of cpufeature helpers,
which should take the feature override into account. Let's do that.
For a good measure (and because we are likely to need to further
down the line), make this helper available to the rest of the
non-modular kernel.
Code that ne
In order to map the override of idregs to options that a user
can easily understand, let's introduce yet another option
array, which maps an option to the corresponding idreg options.
Signed-off-by: Marc Zyngier
Reviewed-by: Catalin Marinas
Acked-by: David Brazdil
---
arch/arm64/kernel/idreg-o
For completeness, let's document the HVC_VHE_RESTART stub.
Signed-off-by: Marc Zyngier
Acked-by: David Brazdil
---
Documentation/virt/kvm/arm/hyp-abi.rst | 9 +
1 file changed, 9 insertions(+)
diff --git a/Documentation/virt/kvm/arm/hyp-abi.rst
b/Documentation/virt/kvm/arm/hyp-abi.rst
In order to be able to disable Pointer Authentication at runtime,
whether it is for testing purposes, or to work around HW issues,
let's add support for overriding the ID_AA64ISAR1_EL1.{GPI,GPA,API,APA}
fields.
This is further mapped on the arm64.nopauth command-line alias.
Signed-off-by: Marc Z
In order to be able to disable BTI at runtime, whether it is
for testing purposes, or to work around HW issues, let's add
support for overriding the ID_AA64PFR1_EL1.BTI field.
This is further mapped on the arm64.nobti command-line alias.
Signed-off-by: Marc Zyngier
Reviewed-by: Catalin Marinas
We can now move the initial SCTLR_EL1 setup to be used for both
EL1 and EL2 setup.
Signed-off-by: Marc Zyngier
Acked-by: Catalin Marinas
Acked-by: David Brazdil
---
arch/arm64/kernel/head.S | 8 +++-
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/a
As init_el2_state is now nVHE only, let's simplify it and drop
the VHE setup.
Signed-off-by: Marc Zyngier
Acked-by: David Brazdil
Acked-by: Catalin Marinas
---
arch/arm64/include/asm/el2_setup.h | 33 --
arch/arm64/kernel/head.S | 2 +-
arch/arm64/kvm/hyp
There isn't much that a VHE kernel needs on top of whatever has
been done for nVHE, so let's move the little we need to the
VHE stub (the SPE setup), and drop the init_el2_state macro.
No expected functional change.
Signed-off-by: Marc Zyngier
Acked-by: David Brazdil
Acked-by: Catalin Marinas
When running VHE, we set MDSCR_EL2.TPMS very early on to force
the trapping of EL1 SPE accesses to EL2.
However:
- we are running with HCR_EL2.{E2H,TGE}={1,1}, meaning that there
is no EL1 to trap from
- before entering a guest, we call kvm_arm_setup_debug(), which
sets MDCR_EL2_TPMS in the p
As we are aiming to be able to control whether we enable VHE or
not, let's always drop down to EL1 first, and only then upgrade
to VHE if at all possible.
This means that if the kernel is booted at EL2, we always start
with a nVHE init, drop to EL1 to initialise the the kernel, and
only then upgra
As we are about to change the way a VHE system boots, let's
provide the core helper, in the form of a stub hypercall that
enables VHE and replicates the full EL1 context at EL2, thanks
to EL1 and VHE-EL2 being extremely similar.
On exception return, the kernel carries on at EL2. Fancy!
Nothing ca
Turning the MMU on is a popular sport in the arm64 kernel, and
we do it more than once, or even twice. As we are about to add
even more, let's turn it into a macro.
No expected functional change.
Signed-off-by: Marc Zyngier
Acked-by: Catalin Marinas
Acked-by: David Brazdil
---
arch/arm64/incl
The arm64 kernel has long be able to use more than 39bit VAs.
Since day one, actually. Let's rewrite the offending comment.
Signed-off-by: Marc Zyngier
Acked-by: Catalin Marinas
Acked-by: David Brazdil
---
arch/arm64/mm/proc.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --g
It recently came to light that there is a need to be able to override
some CPU features very early on, before the kernel is fully up and
running. The reasons for this range from specific feature support
(such as using Protected KVM on VHE HW, which is the main motivation
for this work) to errata wo
If someone happens to write the following code:
b 1f
init_el2_state vhe
1:
[...]
they will be in for a long debugging session, as the label "1f"
will be resolved *inside* the init_el2_state macro instead of
after it. Not really what one expects.
Instead, rewite the
84 matches
Mail list logo