On Mon, Jul 19, 2021 at 9:04 AM Fuad Tabba wrote:
>
> Protected KVM does not support protected AArch32 guests. However,
> it is possible for the guest to force run AArch32, potentially
> causing problems. Add an extra check so that if the hypervisor
> catches the guest doing that, it can prevent t
On Mon, Jul 19, 2021 at 11:02 AM Jean-Philippe Brucker
wrote:
> We forward the whole PSCI function range, so it's either KVM or userspace.
> If KVM manages PSCI and the guest calls an unimplemented function, that
> returns directly to the guest without going to userspace.
>
> The concern is valid
Test that userpace adjustment of the guest physical counter-timer
results in the correct view of within the guest.
Signed-off-by: Oliver Upton
---
.../selftests/kvm/include/aarch64/processor.h | 12
.../kvm/system_counter_offset_test.c | 29 ---
2 files changed,
Presently, KVM provides no facilities for correctly migrating a guest
that depends on the physical counter-timer. While most guests (barring
NV, of course) should not depend on the physical counter-timer, an
operator may still wish to provide a consistent view of the physical
counter-timer across m
Introduce a KVM selftest to verify that userspace manipulation of the
TSC (via the new vCPU attribute) results in the correct behavior within
the guest.
Signed-off-by: Oliver Upton
---
tools/testing/selftests/kvm/.gitignore| 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.
Add a new vCPU attribute that allows userspace to directly manipulate
the virtual counter-timer offset. Exposing such an interface allows for
the precise migration of guest virtual counter-timers, as it is an
indepotent interface.
Uphold the existing behavior of writes to CNTVOFF_EL2 for this new
KVM/arm64 now allows userspace to adjust the guest virtual counter-timer
via a vCPU device attribute. Test that changes to the virtual
counter-timer offset result in the correct view being presented to the
guest.
Signed-off-by: Oliver Upton
---
tools/testing/selftests/kvm/Makefile | 1
Add a test case for counter emulation on arm64. A side effect of how KVM
handles physical counter offsetting on non-ECV systems is that the
virtual counter will always hit hardware and the physical could be
emulated. Force emulation by writing a nonzero offset to the physical
counter and compare th
Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.
This changes the locking semantics around TSC writes. Writes to the TSC
will now take the pvclock gt
Add a selftest for the new KVM clock UAPI that was introduced. Ensure
that the KVM clock is consistent between userspace and the guest, and
that the difference in realtime will only ever cause the KVM clock to
advance forward.
Signed-off-by: Oliver Upton
---
tools/testing/selftests/kvm/.gitignor
vCPU file descriptors are abstracted away from test code in KVM
selftests, meaning that tests cannot directly access a vCPU's device
attributes. Add helpers that tests can use to get at vCPU device
attributes.
Signed-off-by: Oliver Upton
---
.../testing/selftests/kvm/include/kvm_util.h | 9 +++
Copy over approximately clean versions of the pvclock headers into
tools. Reconcile headers/symbols missing in tools that are unneeded.
Signed-off-by: Oliver Upton
---
tools/arch/x86/include/asm/pvclock-abi.h | 48 +++
tools/arch/x86/include/asm/pvclock.h | 103 +
To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.
A mu
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.
Prov
KVM's current means of saving/restoring system counters is plagued with
temporal issues. At least on ARM64 and x86, we migrate the guest's
system counter by-value through the respective guest system register
values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
brittle as the state i
On Thursday 15 Jul 2021 at 17:31:45 (+0100), Marc Zyngier wrote:
> Make sure we don't issue CMOs when mapping something that
> is not a memory address in the S2 page tables.
>
> Signed-off-by: Marc Zyngier
> ---
> arch/arm64/kvm/hyp/pgtable.c | 16 ++--
> 1 file changed, 10 insertion
Add trap handlers for protected VMs. These are mainly for Sys64
and debug traps.
No functional change intended as these are not hooked in yet to
the guest exit handlers introduced earlier. So even when trapping
is triggered, the exit handlers would let the host handle it, as
before.
Signed-off-by
Protected KVM does not support protected AArch32 guests. However,
it is possible for the guest to force run AArch32, potentially
causing problems. Add an extra check so that if the hypervisor
catches the guest doing that, it can prevent the guest from
running again by resetting vcpu->arch.target an
Trap accesses to restricted features for VMs running in protected
mode.
Access to feature registers are emulated, and only supported
features are exposed to protected VMs.
Accesses to restricted registers as well as restricted
instructions are trapped, and an undefined exception is injected
into
Add feature register flag definitions to clarify which features
might be supported.
Consolidate the various ID_AA64PFR0_ELx flags for all ELs.
No functional change intended.
Signed-off-by: Fuad Tabba
---
arch/arm64/include/asm/cpufeature.h | 4 ++--
arch/arm64/include/asm/sysreg.h | 12 ++
Fix the places in KVM that treat MDCR_EL2 as a 32-bit register.
More recent features (e.g., FEAT_SPEv1p2) use bits above 31.
No functional change intended.
Acked-by: Will Deacon
Signed-off-by: Fuad Tabba
---
arch/arm64/include/asm/kvm_arm.h | 20 ++--
arch/arm64/include/asm/k
Restrict protected VM capabilities based on the
fixed-configuration for protected VMs.
No functional change intended in current KVM-supported modes
(nVHE, VHE).
Signed-off-by: Fuad Tabba
---
arch/arm64/include/asm/kvm_fixed_config.h | 10
arch/arm64/kvm/arm.c | 63
Move the sanitized copies of the CPU feature registers to the
recently created sys_regs.c. This consolidates all copies in a
more relevant file.
No functional change intended.
Signed-off-by: Fuad Tabba
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 6 --
arch/arm64/kvm/hyp/nvhe/sys_regs.c|
Track the baseline guest value for cptr_el2 in struct
kvm_vcpu_arch, similar to the other registers that control traps.
Use this value when setting cptr_el2 for the guest.
Currently this value is unchanged (CPTR_EL2_DEFAULT), but future
patches will set trapping bits based on features supported fo
Change the names of hcr_el2 register fields to match the Arm
Architecture Reference Manual. Easier for cross-referencing and
for grepping.
Also, change the name of CPTR_EL2_RES1 to CPTR_NVHE_EL2_RES1,
because res1 bits are different for VHE.
No functional change intended.
Acked-by: Will Deacon
Remove trailing whitespace from comment in trap_dbgauthstatus_el1().
No functional change intended.
Signed-off-by: Fuad Tabba
---
arch/arm64/kvm/sys_regs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f6f126e
Add a function to check whether a VM is protected (under pKVM).
Since the creation of protected VMs isn't enabled yet, this is a
placeholder that always returns false. The intention is for this
to become a check for protected VMs in the future (see Will's RFC
[*]).
No functional change intended.
Hi,
Changes since v2 [1]:
- Both trapping and setting of feature id registers are toggled by an allowed
features bitmap of the feature id registers (Will)
- Documentation explaining the rationale behind allowed/blocked features (Drew)
- Restrict protected VM features by checking and restricting
Add an array of pointers to handlers for various trap reasons in
nVHE code.
The current code selects how to fixup a guest on exit based on a
series of if/else statements. Future patches will also require
different handling for guest exists. Create an array of handlers
to consolidate them.
No func
On deactivating traps, restore the value of mdcr_el2 from the
newly created and preserved host value vcpu context, rather than
directly reading the hardware register.
Up until and including this patch the two values are the same,
i.e., the hardware register and the vcpu one. A future patch will
be
Refactor sys_regs.h and sys_regs.c to make it easier to reuse
common code. It will be used in nVHE in a later patch.
Note that the refactored code uses __inline_bsearch for find_reg
instead of bsearch to avoid copying the bsearch code for nVHE.
No functional change intended.
Signed-off-by: Fuad
Add hardware configuration register bit definitions for HCR_EL2
and MDCR_EL2. Future patches toggle these hyp configuration
register bits to trap on certain accesses.
No functional change intended.
Signed-off-by: Fuad Tabba
---
arch/arm64/include/asm/kvm_arm.h | 22 ++
1 fil
On Monday 19 Jul 2021 at 16:01:40 (+0100), Marc Zyngier wrote:
> On Mon, 19 Jul 2021 11:47:30 +0100,
> Quentin Perret wrote:
> > +static int finalize_mappings(void)
> > +{
> > + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RWX;
> > + int ret;
> > +
> > + /*
> > +* The host's .bss and .r
On Monday 19 Jul 2021 at 15:43:34 (+0100), Marc Zyngier wrote:
> On Mon, 19 Jul 2021 11:47:29 +0100,
> Quentin Perret wrote:
> >
> > The hypervisor will soon be in charge of tracking ownership of all
> > memory pages in the system. The current page-tracking infrastructure at
> > EL2 only allows b
On Monday 19 Jul 2021 at 15:24:32 (+0100), Marc Zyngier wrote:
> On Mon, 19 Jul 2021 11:47:28 +0100,
> Quentin Perret wrote:
> >
> > Much of the stage-2 manipulation logic relies on being able to destroy
> > block mappings if e.g. installing a smaller mapping in the range. The
> > rationale for t
Hi Alex,
I'm not planning to resend this work at the moment, because it looks like
vcpu hot-add will go a different way so I don't have a user. But I'll
probably address the feedback so far and park it on some branch, in case
anyone else needs it.
On Mon, Jul 19, 2021 at 04:29:18PM +0100, Alexand
On Monday 19 Jul 2021 at 13:55:29 (+0100), Marc Zyngier wrote:
> On Mon, 19 Jul 2021 11:47:26 +0100,
> Quentin Perret wrote:
> >
> > The nVHE protected mode uses invalid mappings in the host stage-2
> > page-table to track the owner of each page in the system. In order to
> > allow the usage of i
On Monday 19 Jul 2021 at 13:14:48 (+0100), Marc Zyngier wrote:
> On Mon, 19 Jul 2021 11:47:24 +0100,
> Quentin Perret wrote:
> >
> > The stage-2 map walkers currently return -EAGAIN when re-creating
> > identical mappings or only changing access permissions. This allows to
> > optimize mapping pa
Hi Alex,
On 2021-07-19 17:35, Alexandru Elisei wrote:
Hi Marc,
On 7/19/21 1:39 PM, Marc Zyngier wrote:
We keep an entry for the PMSWINC_EL0 register in the vcpu structure,
while *never* writing anything there outside of reset.
Given that the register is defined as write-only, that we always
t
Hi Marc,
On 7/19/21 1:39 PM, Marc Zyngier wrote:
> We keep an entry for the PMSWINC_EL0 register in the vcpu structure,
> while *never* writing anything there outside of reset.
>
> Given that the register is defined as write-only, that we always
> trap when this register is accessed, there is litt
Hi Marc,
On 7/19/21 4:56 PM, Marc Zyngier wrote:
> On 2021-07-19 16:55, Alexandru Elisei wrote:
>> Hi Marc,
>>
>> On 7/19/21 1:38 PM, Marc Zyngier wrote:
>>> A number of the PMU sysregs expose reset values that are not
>>> compliant with the architecture (set bits in the RES0 ranges,
>>> for examp
On 2021-07-19 16:55, Alexandru Elisei wrote:
Hi Marc,
On 7/19/21 1:38 PM, Marc Zyngier wrote:
A number of the PMU sysregs expose reset values that are not
compliant with the architecture (set bits in the RES0 ranges,
for example).
This in turn has the effect that we need to pointlessly mask
so
Hi Marc,
On 7/19/21 1:38 PM, Marc Zyngier wrote:
> A number of the PMU sysregs expose reset values that are not
> compliant with the architecture (set bits in the RES0 ranges,
> for example).
>
> This in turn has the effect that we need to pointlessly mask
> some register fields when using them.
>
Hi Jean-Philippe,
I'm not really familiar with this part of KVM, and I'm still trying to get my
head
around how this works, so please bare with me if I ask silly questions.
This is how I understand this will work:
1. VMM opts in to forward HVC calls not handled by KVM.
2. VMM opts in to forwar
On Mon, 19 Jul 2021 11:47:30 +0100,
Quentin Perret wrote:
>
> As the hypervisor maps the host's .bss and .rodata sections in its
> stage-1, make sure to tag them as shared in hyp and host page-tables.
>
> But since the hypervisor relies on the presence of these mappings, we
> cannot let the host
The KVM pgtable API exposes the kvm_pgtable_walk() function to allow
the definition of walkers outside of pgtable.c. However, it is not easy
to implement any of those walkers without some of the low-level helpers,
such as kvm_pte_valid(). Make it static inline, and move it to the
header file to all
Refactor the hypervisor stage-1 locking in nVHE protected mode to expose
a new pkvm_create_mappings_locked() function. This will be used in later
patches to allow walking and changing the hypervisor stage-1 without
releasing the lock.
Signed-off-by: Quentin Perret
---
arch/arm64/kvm/hyp/include/
The host kernel is currently able to change EL2 stage-1 mappings without
restrictions thanks to the __pkvm_create_mappings() hypercall. But in a
world where the host is no longer part of the TCB, this clearly poses a
problem.
To fix this, introduce a new hypercall to allow the host to share a
rang
As the hypervisor maps the host's .bss and .rodata sections in its
stage-1, make sure to tag them as shared in hyp and host page-tables.
But since the hypervisor relies on the presence of these mappings, we
cannot let the host in complete control of the memory regions -- it
must not unshare or don
Much of the stage-2 manipulation logic relies on being able to destroy
block mappings if e.g. installing a smaller mapping in the range. The
rationale for this behaviour is that stage-2 mappings can always be
re-created lazily. However, this gets more complicated when the stage-2
page-table is used
Introduce helper functions in the KVM stage-2 and stage-1 page-table
manipulation library allowing to retrieve the enum kvm_pgtable_prot of a
PTE. This will be useful to implement custom walkers outside of
pgtable.c.
Signed-off-by: Quentin Perret
---
arch/arm64/include/asm/kvm_pgtable.h | 20 +++
__pkvm_create_private_mapping() allows the host kernel to create
arbitrary mappings the hypervisor's "private" range. However, this is
only needed early on, and there should be no good reason for the host
to need this past the point where the pkvm static is set. Make sure to
stub the hypercall past
The kvm_pgtable_stage2_find_range() function is used in the host memory
abort path to try and look for the largest block mapping that can be
used to map the faulting address. In order to do so, the function
currently walks the stage-2 page-table and looks for existing
incompatible mappings within t
The stage-2 map walkers currently return -EAGAIN when re-creating
identical mappings or only changing access permissions. This allows to
optimize mapping pages for concurrent (v)CPUs faulting on the same
page.
While this works as expected when touching one page-table leaf at a
time, this can lead
We currently unmap all MMIO mappings from the host stage-2 to recycle
the pages whenever we run out. In order to make this pattern easy to
re-use from other places, factor the logic out into a dedicated macro.
While at it, apply the macro for the kvm_pgtable_stage2_set_owner()
calls. They're curren
Hi all,
This series aims to improve how the nVHE hypervisor tracks ownership of memory
pages when running in protected mode ("kvm-arm.mode=protected" on the kernel
command line).
The main issue with the existing ownership tracking code is that it is
completely binary: a page is either owned by an
The hypervisor will soon be in charge of tracking ownership of all
memory pages in the system. The current page-tracking infrastructure at
EL2 only allows binary states: a page is either owned or not by an
entity. But a number of use-cases will require more complex states for
pages that are shared
The current hypervisor stage-1 mapping code doesn't allow changing an
existing valid mapping. Relax this condition by allowing changes that
only target ignored bits, as that will soon be needed to annotate shared
pages.
Signed-off-by: Quentin Perret
---
arch/arm64/kvm/hyp/pgtable.c | 18
The ignored bits for both stage-1 and stage-2 page and block
descriptors are in [55:58], so rename KVM_PTE_LEAF_ATTR_S2_IGNORED to
make it applicable to both.
Signed-off-by: Quentin Perret
---
arch/arm64/kvm/hyp/pgtable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ar
The nVHE protected mode uses invalid mappings in the host stage-2
page-table to track the owner of each page in the system. In order to
allow the usage of ignored bits (a.k.a. software bits) in these
mappings, move the owner encoding away from the top bits.
Signed-off-by: Quentin Perret
---
arch
On Mon, 19 Jul 2021 11:47:29 +0100,
Quentin Perret wrote:
>
> The hypervisor will soon be in charge of tracking ownership of all
> memory pages in the system. The current page-tracking infrastructure at
> EL2 only allows binary states: a page is either owned or not by an
> entity. But a number of
On Mon, 19 Jul 2021 11:47:28 +0100,
Quentin Perret wrote:
>
> Much of the stage-2 manipulation logic relies on being able to destroy
> block mappings if e.g. installing a smaller mapping in the range. The
> rationale for this behaviour is that stage-2 mappings can always be
> re-created lazily. H
On Mon, 19 Jul 2021 11:47:26 +0100,
Quentin Perret wrote:
>
> The nVHE protected mode uses invalid mappings in the host stage-2
> page-table to track the owner of each page in the system. In order to
> allow the usage of ignored bits (a.k.a. software bits) in these
> mappings, move the owner enco
From: Alexandre Chartre
In a KVM guest on arm64, performance counters interrupts have an
unnecessary overhead which slows down execution when using the "perf
record" command and limits the "perf record" sampling period.
The problem is that when a guest VM disables counters by clearing the
PMCR_E
We always sanitise our PMU sysreg on the write side, so there
is no need to do it on the read side as well.
Drop the unnecessary masking.
Acked-by: Russell King (Oracle)
Reviewed-by: Alexandre Chartre
Reviewed-by: Alexandru Elisei
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/pmu-emul.c | 3
We keep an entry for the PMSWINC_EL0 register in the vcpu structure,
while *never* writing anything there outside of reset.
Given that the register is defined as write-only, that we always
trap when this register is accessed, there is little point in saving
anything anyway.
Get rid of the entry,
A number of the PMU sysregs expose reset values that are not
compliant with the architecture (set bits in the RES0 ranges,
for example).
This in turn has the effect that we need to pointlessly mask
some register fields when using them.
Let's start by making sure we don't have illegal values in th
This is the second version of the series initially posted at [1].
* From v1:
- Simplified masking in patch #1
- Added a patch dropping PMSWINC_EL0 as a shadow register, though it
is still advertised to userspace for the purpose of backward
compatibility of VM save/restore
- Collected
On Mon, 19 Jul 2021 11:47:24 +0100,
Quentin Perret wrote:
>
> The stage-2 map walkers currently return -EAGAIN when re-creating
> identical mappings or only changing access permissions. This allows to
> optimize mapping pages for concurrent (v)CPUs faulting on the same
> page.
>
> While this wor
On Mon, 19 Jul 2021 07:31:30 +0100,
Paolo Bonzini wrote:
>
> On 17/07/21 11:55, Marc Zyngier wrote:
> > We currently rely on the kvm_is_transparent_hugepage() helper to
> > discover whether a given page has the potential to be mapped as
> > a block mapping.
> >
> > However, this API doesn't real
70 matches
Mail list logo